Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 29;15(1):33431.
doi: 10.1038/s41598-025-18935-6.

Enhancing cross view geo localization through global local quadrant interaction network

Affiliations

Enhancing cross view geo localization through global local quadrant interaction network

Xu Jin et al. Sci Rep. .

Abstract

Cross-view geo-localization aims to match images of the same location captured from different perspectives, such as drone and satellite views. This task is inherently challenging due to significant visual discrepancies caused by viewpoint variations. Existing approaches often rely on global descriptors or limited directional cues, failing to effectively integrate diverse spatial information and global-local interactions. To address these limitations, we propose the Global-Local Quadrant Interaction Network (GLQINet), which enhances feature representation through two key components: the Quadrant Insight Module (QIM) and the Integrated Global-Local Attention Module (IGLAM). QIM partitions feature maps into directional quadrants, refining multi-scale spatial representations while preserving intra-class consistency. Meanwhile, IGLAM bridges global and local features by aggregating high-association feature stripes, reinforcing semantic coherence and spatial correlations. Extensive experiments on the University-1652 and SUES-200 benchmarks demonstrate that GLQINet significantly improves geo-localization accuracy, achieving state-of-the-art performance and effectively mitigating cross-view discrepancies.

Keywords: Cross-view; Geo-localization; Integrated global-local attention; Quadrant insight.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
An illustration of the motivation behind our work. Our proposed GLQINet generates diverse patterns to encourage the network to learn informative feature representations by focusing on discriminative aspects of the input. In addition, the model employs an attention-based mechanism in an interactive manner to effectively learn both global and local features, enabling a comprehensive understanding of the geographic context across different views. The satellite imagery shown in this figure is derived from the University-1652 dataset , and the dataset can be accessed at: https://github.com/layumi/University1652-Baseline.
Fig. 2
Fig. 2
The proposed network’s architecture comprises a dual-stream feature extraction backbone, the Quadrant Insight Module (QIM), and the Integrated Global-Local Attention Module (IGLAM). QIM leverages fine-grained details to generate four-directional local representations of the feature. IGLAM integrates both global and local embeddings, enabling simultaneous attention to different perspectives within various feature spaces and incorporating additional key features into comprehensive final representations.
Fig. 3
Fig. 3
Comparison of our Integrated Global-Local Attention Module (IGLAM) with existing interaction mechanisms. (a) Self-attention concatenates local and global features before passing them through a self-attention block. (b) Cross-attention fuses features via a cross-attention layer. (c) Co-attention applies a cross-attention layer followed by a self-attention block. (d) Our merged attention first concatenates global and local features, then processes them through a single cross-attention block, enabling effective cross-view interaction.
Algorithm 1
Algorithm 1
Training procedure of GLQINet.
Fig. 4
Fig. 4
Ablation study on comparison with different components.
Fig. 5
Fig. 5
Ablation study on comparison with different values of t.
Fig. 6
Fig. 6
The error cases visualization of our method and baseline, with blue boxes denoting correct matching and red boxes signifying false matching. The satellite images shown in this figure are derived from the University-1652 dataset , and the dataset can be accessed at: https://github.com/layumi/University1652-Baseline.

References

    1. Huang, G. et al. Dino-mix enhancing visual place recognition with foundational vision model and feature mixing. Sci. Rep.14, 22100 (2024). - PMC - PubMed
    1. Fockert, A. et al. Assessing the detection of floating plastic litter with advanced remote sensing technologies in a hydrodynamic test facility. Sci. Rep.14, 25902 (2024). - PMC - PubMed
    1. Yang, J. et al. Gle-net: Global-local information enhancement for semantic segmentation of remote sensing images. Sci. Rep.14, 25282 (2024). - PMC - PubMed
    1. Bai, C., Bai, X., Wu, K. & Ye, Y. Adaptive condition-aware high-dimensional decoupling remote sensing image object detection algorithm. Sci. Rep.14, 20090 (2024). - PMC - PubMed
    1. Shahabi, H. & Hashim, M. Landslide susceptibility mapping using GIS-based statistical models and remote sensing data in tropical environment. Sci. Rep.5, 9899 (2015). - PMC - PubMed

LinkOut - more resources