Multi-label emotion classification of Urdu tweets

Noman Ashraf¹, Lal Khan², Sabur Butt¹, Hsien-Tsung Chang^{2

3

4}, Grigori Sidorov¹, Alexander Gelbukh¹

Affiliations

¹ CIC, Instituto Politécnico Nacional, Mexico City, Mexico.
² Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan.
³ Artificial Intelligence Research Center, Chang Gung University, Taoyuan, Taiwan.
⁴ Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan, Taiwan.

PMID: 35494831
PMCID: PMC9044368
DOI: 10.7717/peerj-cs.896

Multi-label emotion classification of Urdu tweets

Noman Ashraf et al. PeerJ Comput Sci. 2022.

. 2022 Apr 22:8:e896.

doi: 10.7717/peerj-cs.896. eCollection 2022.

Authors

Noman Ashraf¹, Lal Khan², Sabur Butt¹, Hsien-Tsung Chang^{2

3

4}, Grigori Sidorov¹, Alexander Gelbukh¹

Affiliations

¹ CIC, Instituto Politécnico Nacional, Mexico City, Mexico.
² Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan.
³ Artificial Intelligence Research Center, Chang Gung University, Taoyuan, Taiwan.
⁴ Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan, Taiwan.

PMID: 35494831
PMCID: PMC9044368
DOI: 10.7717/peerj-cs.896

Abstract

Urdu is a widely used language in South Asia and worldwide. While there are similar datasets available in English, we created the first multi-label emotion dataset consisting of 6,043 tweets and six basic emotions in the Urdu Nastalíq script. A multi-label (ML) classification approach was adopted to detect emotions from Urdu. The morphological and syntactic structure of Urdu makes it a challenging problem for multi-label emotion detection. In this paper, we build a set of baseline classifiers such as machine learning algorithms (Random forest (RF), Decision tree (J48), Sequential minimal optimization (SMO), AdaBoostM1, and Bagging), deep-learning algorithms (Convolutional Neural Networks (1D-CNN), Long short-term memory (LSTM), and LSTM with CNN features) and transformer-based baseline (BERT). We used a combination of text representations: stylometric-based features, pre-trained word embedding, word-based n-grams, and character-based n-grams. The paper highlights the annotation guidelines, dataset characteristics and insights into different methodologies used for Urdu based emotion classification. We present our best results using micro-averaged F1, macro-averaged F1, accuracy, Hamming loss (HL) and exact match (EM) for all tested methods.

Keywords: Deep learning; Emotion classification in Urdu; Emotion detection; Machine learning; Multi-label emotion detection; Natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1. Multilabel emotion detection model for Urdu language.**

**Figure 2. Examples in our dataset (translated by Google).**

**Figure 3. 1D-CNN model architecture.**

See this image and copyright information in PMC

References

1. Adeeba F, Hussain S. Experiences in building Urdu wordnet. Proceedings of the 9th Workshop on Asian Language Resources; 2011. pp. 31–35.
1. Alm CO, Roth D, Sproat R. Emotions from text: machine learning for text-based emotion prediction. Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT, 2005; Stroudsburg: Association for Computational Linguistics; 2005. pp. 579–586.
1. Aman S, Szpakowicz S. Identifying expressions of emotion in text. Proceedings of the 10th International Conference on Text, Speech and Dialogue, TSD’07; Berlin: Springer-Verlag; 2007. pp. 196–205.
1. Ameer I, Ashraf N, Sidorov G, Adorno HG. Multi-label emotion classification using content-based features in Twitter. Computación y Sistemas. 2021;24(3):1159–1164. doi: 10.13053/CyS-24-3-3476. - DOI
1. Amjad M, Ashraf N, Zhila A, Sidorov G, Zubiaga A, Gelbukh A. Threatening language detection and target identification in Urdu tweets. IEEE Access. 2021;9:128302–128313. doi: 10.1109/ACCESS.2021.3112500. - DOI

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-label emotion classification of Urdu tweets

Affiliations

Multi-label emotion classification of Urdu tweets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Miscellaneous