. 2023 Aug 16;25(8):1216.

doi: 10.3390/e25081216.

Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning

Zhao Huang^{1

2}, Haowu Hu², Miao Su²

Affiliations

¹ Key Laboratory of Modern Teaching Technology, Ministry of Education, Xi'an 710062, China.
² School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.

PMID: 37628246
PMCID: PMC10452985
DOI: 10.3390/e25081216

Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning

Zhao Huang et al. Entropy (Basel). 2023.

. 2023 Aug 16;25(8):1216.

doi: 10.3390/e25081216.

Authors

Zhao Huang^{1

2}, Haowu Hu², Miao Su²

Affiliations

¹ Key Laboratory of Modern Teaching Technology, Ministry of Education, Xi'an 710062, China.
² School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.

PMID: 37628246
PMCID: PMC10452985
DOI: 10.3390/e25081216

Abstract

Information retrieval across multiple modes has attracted much attention from academics and practitioners. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. Most of the existing methods tend to jointly construct a common subspace. However, very little attention has been given to the study of the importance of different fine-grained regions of various modalities. This lack of consideration significantly influences the utilization of the extracted information of multiple modalities. Therefore, this study proposes a novel text-image cross-modal retrieval approach that constructs a dual attention network and an enhanced relation network (DAER). More specifically, the dual attention network tends to precisely extract fine-grained weight information from text and images, while the enhanced relation network is used to expand the differences between different categories of data in order to improve the computational accuracy of similarity. The comprehensive experimental results on three widely-used major datasets (i.e., Wikipedia, Pascal Sentence, and XMediaNet) show that our proposed approach is effective and superior to existing cross-modal retrieval methods.

Keywords: cross-modal retrieval; data augmentation; dual attention network; enhanced relation network.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
The framework of the DAER approach for cross-modal retrieval.

**Figure 2**
The structure of bottleneck module with dual spatial attention network.

**Figure 3**
Normal distribution with two different powers (**left**: power = 1; **right**: power = 0.6).

**Figure 4**
Example of retrieval tasks using the proposed DAER.

**Figure 5**
Comparison of DAER with the selected methods in three datasets.

**Figure 6**
Comparison of mAP values of each method used in two tasks in three datasets.

**Figure 7**
The improvement of our proposed approach in three databases.

See this image and copyright information in PMC

References

1. Li Z., Lu H., Fu H., Gu G. Image-text bidirectional learning network based cross-modal retrieval. Neurocomputing. 2022;483:148–159. doi: 10.1016/j.neucom.2022.02.007. - DOI
1. Nagrani A., Albanie S., Zisserman A. Seeing voices and hearing faces: Cross-modal biometric matching; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA. 18–23 June 2018; pp. 8427–8436.
1. Yongjun Z., Yan E., Song I.Y. A natural language interface to a graph-based bibliographic information retrieval system. Data Knowl. Eng. 2017;111:73–89.
1. Yuxin P., Jinwe Q., Yuxin Y. Modality-specific cross-modal similarity measurement with recurrent attention network. IEEE Trans. Image Process. 2018;27:5585–5599. - PubMed
1. Gupta Y., Saini A., Saxena A. A new fuzzy logic based ranking function for efficient information retrieval system. Expert Syst. Appl. 2015;42:1223–1234. doi: 10.1016/j.eswa.2014.09.009. - DOI

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning

Affiliations

Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous