Enhancing cross view geo localization through global local quadrant interaction network

Xu Jin^{1

2}, Yin Junping^{3

4

5}, Zhang Juan^{2

6}, Gao Tianyan^{2

7}

Affiliations

¹ Institute of Applied Physics and Computational Mathematics, China Academy of Engineering Physics, Beijing, 100193, China.
² Data Fusion Laboratory, Shanghai Zhangjiang Institute of Mathematics, Shanghai, 201210, China.
³ Institute of Applied Physics and Computational Mathematics, China Academy of Engineering Physics, Beijing, 100193, China. yinjp829829@126.com.
⁴ Data Fusion Laboratory, Shanghai Zhangjiang Institute of Mathematics, Shanghai, 201210, China. yinjp829829@126.com.
⁵ School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, China. yinjp829829@126.com.
⁶ School of Artificial Intelligence, Beihang University, Beijing, 100083, China.
⁷ School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, China.

PMID: 41023367
PMCID: PMC12480966
DOI: 10.1038/s41598-025-18935-6

Enhancing cross view geo localization through global local quadrant interaction network

Xu Jin et al. Sci Rep. 2025.

. 2025 Sep 29;15(1):33431.

doi: 10.1038/s41598-025-18935-6.

Authors

Xu Jin^{1

2}, Yin Junping^{3

4

5}, Zhang Juan^{2

6}, Gao Tianyan^{2

7}

Affiliations

¹ Institute of Applied Physics and Computational Mathematics, China Academy of Engineering Physics, Beijing, 100193, China.
² Data Fusion Laboratory, Shanghai Zhangjiang Institute of Mathematics, Shanghai, 201210, China.
³ Institute of Applied Physics and Computational Mathematics, China Academy of Engineering Physics, Beijing, 100193, China. yinjp829829@126.com.
⁴ Data Fusion Laboratory, Shanghai Zhangjiang Institute of Mathematics, Shanghai, 201210, China. yinjp829829@126.com.
⁵ School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, China. yinjp829829@126.com.
⁶ School of Artificial Intelligence, Beihang University, Beijing, 100083, China.
⁷ School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, China.

PMID: 41023367
PMCID: PMC12480966
DOI: 10.1038/s41598-025-18935-6

Abstract

Cross-view geo-localization aims to match images of the same location captured from different perspectives, such as drone and satellite views. This task is inherently challenging due to significant visual discrepancies caused by viewpoint variations. Existing approaches often rely on global descriptors or limited directional cues, failing to effectively integrate diverse spatial information and global-local interactions. To address these limitations, we propose the Global-Local Quadrant Interaction Network (GLQINet), which enhances feature representation through two key components: the Quadrant Insight Module (QIM) and the Integrated Global-Local Attention Module (IGLAM). QIM partitions feature maps into directional quadrants, refining multi-scale spatial representations while preserving intra-class consistency. Meanwhile, IGLAM bridges global and local features by aggregating high-association feature stripes, reinforcing semantic coherence and spatial correlations. Extensive experiments on the University-1652 and SUES-200 benchmarks demonstrate that GLQINet significantly improves geo-localization accuracy, achieving state-of-the-art performance and effectively mitigating cross-view discrepancies.

Keywords: Cross-view; Geo-localization; Integrated global-local attention; Quadrant insight.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
An illustration of the motivation behind our work. Our proposed GLQINet generates diverse patterns to encourage the network to learn informative feature representations by focusing on discriminative aspects of the input. In addition, the model employs an attention-based mechanism in an interactive manner to effectively learn both global and local features, enabling a comprehensive understanding of the geographic context across different views. The satellite imagery shown in this figure is derived from the University-1652 dataset , and the dataset can be accessed at: https://github.com/layumi/University1652-Baseline.

**Fig. 2**
The proposed network’s architecture comprises a dual-stream feature extraction backbone, the Quadrant Insight Module (QIM), and the Integrated Global-Local Attention Module (IGLAM). QIM leverages fine-grained details to generate four-directional local representations of the feature. IGLAM integrates both global and local embeddings, enabling simultaneous attention to different perspectives within various feature spaces and incorporating additional key features into comprehensive final representations.

**Fig. 3**
Comparison of our Integrated Global-Local Attention Module (IGLAM) with existing interaction mechanisms. (a) Self-attention concatenates local and global features before passing them through a self-attention block. (b) Cross-attention fuses features via a cross-attention layer. (c) Co-attention applies a cross-attention layer followed by a self-attention block. (d) Our merged attention first concatenates global and local features, then processes them through a single cross-attention block, enabling effective cross-view interaction.

**Algorithm 1**
Training procedure of GLQINet.

**Fig. 4**
Ablation study on comparison with different components.

**Fig. 5**
Ablation study on comparison with different values of t.

**Fig. 6**
The error cases visualization of our method and baseline, with blue boxes denoting correct matching and red boxes signifying false matching. The satellite images shown in this figure are derived from the University-1652 dataset , and the dataset can be accessed at: https://github.com/layumi/University1652-Baseline.

See this image and copyright information in PMC

References

1. Huang, G. et al. Dino-mix enhancing visual place recognition with foundational vision model and feature mixing. Sci. Rep.14, 22100 (2024). - DOI - PMC - PubMed
1. Fockert, A. et al. Assessing the detection of floating plastic litter with advanced remote sensing technologies in a hydrodynamic test facility. Sci. Rep.14, 25902 (2024). - DOI - PMC - PubMed
1. Yang, J. et al. Gle-net: Global-local information enhancement for semantic segmentation of remote sensing images. Sci. Rep.14, 25282 (2024). - DOI - PMC - PubMed
1. Bai, C., Bai, X., Wu, K. & Ye, Y. Adaptive condition-aware high-dimensional decoupling remote sensing image object detection algorithm. Sci. Rep.14, 20090 (2024). - DOI - PMC - PubMed
1. Shahabi, H. & Hashim, M. Landslide susceptibility mapping using GIS-based statistical models and remote sensing data in tropical environment. Sci. Rep.5, 9899 (2015). - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing cross view geo localization through global local quadrant interaction network

Affiliations

Enhancing cross view geo localization through global local quadrant interaction network

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources