Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 16;25(8):1216.
doi: 10.3390/e25081216.

Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning

Affiliations

Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning

Zhao Huang et al. Entropy (Basel). .

Abstract

Information retrieval across multiple modes has attracted much attention from academics and practitioners. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. Most of the existing methods tend to jointly construct a common subspace. However, very little attention has been given to the study of the importance of different fine-grained regions of various modalities. This lack of consideration significantly influences the utilization of the extracted information of multiple modalities. Therefore, this study proposes a novel text-image cross-modal retrieval approach that constructs a dual attention network and an enhanced relation network (DAER). More specifically, the dual attention network tends to precisely extract fine-grained weight information from text and images, while the enhanced relation network is used to expand the differences between different categories of data in order to improve the computational accuracy of similarity. The comprehensive experimental results on three widely-used major datasets (i.e., Wikipedia, Pascal Sentence, and XMediaNet) show that our proposed approach is effective and superior to existing cross-modal retrieval methods.

Keywords: cross-modal retrieval; data augmentation; dual attention network; enhanced relation network.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The framework of the DAER approach for cross-modal retrieval.
Figure 2
Figure 2
The structure of bottleneck module with dual spatial attention network.
Figure 3
Figure 3
Normal distribution with two different powers (left: power = 1; right: power = 0.6).
Figure 4
Figure 4
Example of retrieval tasks using the proposed DAER.
Figure 5
Figure 5
Comparison of DAER with the selected methods in three datasets.
Figure 6
Figure 6
Comparison of mAP values of each method used in two tasks in three datasets.
Figure 7
Figure 7
The improvement of our proposed approach in three databases.

Similar articles

References

    1. Li Z., Lu H., Fu H., Gu G. Image-text bidirectional learning network based cross-modal retrieval. Neurocomputing. 2022;483:148–159. doi: 10.1016/j.neucom.2022.02.007. - DOI
    1. Nagrani A., Albanie S., Zisserman A. Seeing voices and hearing faces: Cross-modal biometric matching; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA. 18–23 June 2018; pp. 8427–8436.
    1. Yongjun Z., Yan E., Song I.Y. A natural language interface to a graph-based bibliographic information retrieval system. Data Knowl. Eng. 2017;111:73–89.
    1. Yuxin P., Jinwe Q., Yuxin Y. Modality-specific cross-modal similarity measurement with recurrent attention network. IEEE Trans. Image Process. 2018;27:5585–5599. - PubMed
    1. Gupta Y., Saini A., Saxena A. A new fuzzy logic based ranking function for efficient information retrieval system. Expert Syst. Appl. 2015;42:1223–1234. doi: 10.1016/j.eswa.2014.09.009. - DOI

LinkOut - more resources