Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 22:8:e896.
doi: 10.7717/peerj-cs.896. eCollection 2022.

Multi-label emotion classification of Urdu tweets

Affiliations

Multi-label emotion classification of Urdu tweets

Noman Ashraf et al. PeerJ Comput Sci. .

Abstract

Urdu is a widely used language in South Asia and worldwide. While there are similar datasets available in English, we created the first multi-label emotion dataset consisting of 6,043 tweets and six basic emotions in the Urdu Nastalíq script. A multi-label (ML) classification approach was adopted to detect emotions from Urdu. The morphological and syntactic structure of Urdu makes it a challenging problem for multi-label emotion detection. In this paper, we build a set of baseline classifiers such as machine learning algorithms (Random forest (RF), Decision tree (J48), Sequential minimal optimization (SMO), AdaBoostM1, and Bagging), deep-learning algorithms (Convolutional Neural Networks (1D-CNN), Long short-term memory (LSTM), and LSTM with CNN features) and transformer-based baseline (BERT). We used a combination of text representations: stylometric-based features, pre-trained word embedding, word-based n-grams, and character-based n-grams. The paper highlights the annotation guidelines, dataset characteristics and insights into different methodologies used for Urdu based emotion classification. We present our best results using micro-averaged F1, macro-averaged F1, accuracy, Hamming loss (HL) and exact match (EM) for all tested methods.

Keywords: Deep learning; Emotion classification in Urdu; Emotion detection; Machine learning; Multi-label emotion detection; Natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Multilabel emotion detection model for Urdu language.
Figure 2
Figure 2. Examples in our dataset (translated by Google).
Figure 3
Figure 3. 1D-CNN model architecture.
Figure 4
Figure 4. LSTM model architecture.

References

    1. Adeeba F, Hussain S. Experiences in building Urdu wordnet. Proceedings of the 9th Workshop on Asian Language Resources; 2011. pp. 31–35.
    1. Alm CO, Roth D, Sproat R. Emotions from text: machine learning for text-based emotion prediction. Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT, 2005; Stroudsburg: Association for Computational Linguistics; 2005. pp. 579–586.
    1. Aman S, Szpakowicz S. Identifying expressions of emotion in text. Proceedings of the 10th International Conference on Text, Speech and Dialogue, TSD’07; Berlin: Springer-Verlag; 2007. pp. 196–205.
    1. Ameer I, Ashraf N, Sidorov G, Adorno HG. Multi-label emotion classification using content-based features in Twitter. Computación y Sistemas. 2021;24(3):1159–1164. doi: 10.13053/CyS-24-3-3476. - DOI
    1. Amjad M, Ashraf N, Zhila A, Sidorov G, Zubiaga A, Gelbukh A. Threatening language detection and target identification in Urdu tweets. IEEE Access. 2021;9:128302–128313. doi: 10.1109/ACCESS.2021.3112500. - DOI

LinkOut - more resources