Prompt-based fine-tuning with multilingual transformers for language-independent sentiment analysis
- PMID: 40595680
- PMCID: PMC12215489
- DOI: 10.1038/s41598-025-03559-7
Prompt-based fine-tuning with multilingual transformers for language-independent sentiment analysis
Abstract
In the era of global digital communication, understanding user sentiment across multiple languages is a critical challenge with wide-ranging applications in opinion mining, customer feedback analysis, and social media monitoring. This study advances the field of language-independent sentiment analysis by leveraging prompt-based fine-tuning with state-of-the-art transformer models. The performance of classical machine learning approaches, hybrid deep learning architectures, and multilingual transformer models is evaluated across eight typologically diverse languages: Arabic, English, French, German, Hindi, Italian, Portuguese, and Spanish. Baseline models are established using traditional machine learning approaches such as Support Vector Machines (SVM) and Logistic Regression, with feature extraction methods like TF-IDF. A hybrid deep learning model is introduced, combining Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs) to capture local and sequential text patterns. Building on these, pre-trained multilingual transformer models, specifically BERT-base-multilingual and XLM-RoBERTa, are fine-tuned for language-independent sentiment classification tasks. The key contribution lies in the implementation of prompt-based fine-tuning strategies for language independent sentiment analysis. Using (1) prefix prompts and (2) cloze-style prompts, a unified framework is established that employs templates designed in one language and evaluates their performance on data from the remaining languages. Experimental results demonstrate that transformer models, particularly XLM-RoBERTa with prompt-based fine-tuning outperform both classical and deep learning methods. With only 32 training examples per class, prefix prompts produce results comparable to standard fine-tuning, which typically uses 70-80% of the data for training. This highlights the potential of prompt-based learning for scalable, multilingual sentiment analysis in diverse language settings.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests.
Figures
References
-
- Chang, T. A. & Bergen, B. K. Language model behavior: A comprehensive survey. Comput. Linguistics50, 293–350 (2024). - DOI
-
- Varda, A. G. D. & Marelli, M. Data-driven cross-lingual syntax: An agreement study with massively multilingual models. Comput. Linguistics49, 261–299. 10.1162/coli_a_00472 (2023). - DOI
-
- Mohammad, S. M. Ethics sheet for automatic emotion recognition and sentiment analysis. Comput. Linguistics48, 239–278 (2022). - DOI
LinkOut - more resources
Full Text Sources
Miscellaneous
