Learning Spatial-Aware Cross-View Embeddings for Ground-to-Aerial Geolocalization


Image-based geolocalization is an important alternative to GPS-based localization in GPS-denied situations. Among them, ground-to-aerial geolocalization is particularly promising but also difficult due to drastic viewpoint and appearance differences between ground and aerial images. In this paper, we propose a novel spatial-aware Siamese-like network to address the issue by exploiting the spatial transformer layer to effectively alleviate the large view variation and learn location discriminative embeddings from the cross-view images. Furthermore, we propose to combine the triplet ranking loss with a simple and effective location identity loss to further enhance the performances. We test our method on a publicly available dataset and the results show that the proposed method outperforms state-of-the-art by a large margin.

Image and Graphics