Linguistic-visual based multimodal Yi character recognition
- PMID: 40195531
- PMCID: PMC11977249
- DOI: 10.1038/s41598-025-96397-6
Linguistic-visual based multimodal Yi character recognition
Abstract
The recognition of Yi characters is challenged by considerable variability in their morphological structures and complex semantic relationships, leading to decreased recognition accuracy. This paper presents a multimodal Yi character recognition method comprehensively incorporating linguistic and visual features. The visual transformer, integrated with deformable convolution, effectively captures key features during the visual modeling phase. It effectively adapts to variations in Yi character images, improving recognition accuracy, particularly for images with deformations and complex backgrounds. In the linguistic modeling phase, a Pyramid Pooling Transformer incorporates semantic contextual information across multiple scales, enhancing feature representation and capturing the detailed linguistic structure. Finally, a fusion strategy utilizing the cross-attention mechanism is employed to refine the relationships between feature regions and combine features from different modalities, thereby achieving high-precision character recognition. Experimental results demonstrate that the proposed method achieves a recognition accuracy of 99.5%, surpassing baseline methods by 3.4%, thereby validating its effectiveness.
Keywords: Character recognition; Deep learning; Linguistic-visual model; Transformer.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests.
Figures







References
-
- Ptucha, R., Such, F. P., Pillai, S., Brockler, F., Singh, V., & Hutkowski, P. Intelligent Character Recognition Using Fully Convolutional Neural Networks. Pattern Recogn.88, 604–613 (2019).
-
- Chen, S., Yang, Y., Liu, X. & Zhu, S. Dual discriminator gan: Restoring ancient yi characters. ACM Trans. Asian Low-Resour. Lang. Inf. Process.2(4), 1–23 (2022).
-
- Yin, X., Min, D., Huo, Y. & Yoon, S.-E. Contour-Aware Equipotential Learning for Semantic Segmentation. IEEE Trans. Multimedia25, 6146–6156 (2022).
-
- Yin, X., Im, W., Min, D., Huo, Y., Pan, F., & Yoon, S.-E. Fine-grained Background Representation for Weakly Supervised Semantic Segmentation. IEEE Trans. Circ. Syst. Video Technol. (2024).
Grants and funding
LinkOut - more resources
Full Text Sources