Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 13:60:111460.
doi: 10.1016/j.dib.2025.111460. eCollection 2025 Jun.

AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual context

Affiliations

AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual context

Mavis Sarah Gyimah et al. Data Brief. .

Abstract

Ghanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part of Speech (POS) tagging, and in particular linguistic corpora. The paper introduces AsanteTwiSenti, a comprehensive sentiment corpus for the Ghanaian Asante Twi language with the methods and challenges encountered in the corpus construction. The AsanteTwiSenti corpus contains 10,095 tweets extracted from 30,507 tweets scraped from the Twitter API. Based on standard guidelines and data preprocessing, 8438 tweets are labeled as Positive, Negative, Neutral, Ghanaian-Pidgin, multilingual, and Monolingual. The AsanteTwiSenti corpus seeks to bridge the low-resource gap of the Twi Language, inspire the development of local Ghanaian language resources, and impact academic research of Asante Twi for Natural Language Processing(NLP), language preservation, and education.

Keywords: Asante; Ghanaian Pidgin; Ghanaian languages; Low-resource languages; Multilingual; Natural Language Processing; Sentiment analysis; Twi.

PubMed Disclaimer

Figures

Fig 1
Fig. 1
Data flow diagram for Twi Sentiment dataset construction.
Fig 2
Fig. 2
Sample annotated Tweets. This figure shows examples of annotated tweets from the Twi Sentiment dataset. Each tweet is labeled with a sentiment class (e.g., positive, negative, neutral) to demonstrate the dataset's structure and annotation approach.
Fig 3
Fig. 3
The distribution of tweets into different classes. This figure depicts the distribution of tweets across the three sentiment classes: positive, negative, and neutral. The chart highlights the class imbalance observed in the dataset and provides insights into the dataset composition.
Fig 4
Fig. 4
Twi Tweets word cloud. This figure presents a word cloud representation of the most frequent words in the Twi tweets dataset. The size of each word corresponds to its frequency, offering an overview of the linguistic features in the dataset.
Fig 5
Fig. 5
Multilingual tweets word cloud. This figure presents a word cloud representation of the most frequent words in Multilinqual tweets dataset.
Fig 6
Fig. 6
Ghanaian Pidgin word cloud. This figure presents a word cloud representation of the most frequent words in Ghanaian Pidgin tweets dataset.
Fig 7
Fig. 7
Twi Sentiment Lexicon visualization. It includes examples of sentiment-labeled words and highlights its distribution across positive, negative, and neutral sentiment categories.

References

    1. Mohammad S.M. Sentiment analysis: automatically detecting valence, emotions, and other affectual states from text. Emot. Meas. 2021 https://www.sciencedirect.com/science/article/pii/B9780128211243000119 Available from:
    1. Mabokela K.R., Schlippe T. A sentiment corpus for South African under-resourced languages in a multilingual context. The 1st Annual Meeting of the …. research-karlsruhe.de; 2022. Available from: https://research-karlsruhe.de/pubs/LREC2022_Mabokela+Schlippe_ASentiment....
    1. Muhammad S.H., Adelani D.I., Sa I., Abdulmumin I., Bello B.S., Choudhury M., et al. Sentiment analysis. 2021.
    1. Go A., Bhayani R., Huang L. Twitter sentiment classification using distant supervision.
    1. Abu Kwaik K., Chatzikyriakidis S., Dobnik S., Saad M., Johansson R. Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. 2020. An Arabic tweets sentiment analysis dataset ({ATSAD}) using distant supervision and self training; pp. 1–8.
    2. Available from: https://www.aclweb.org/anthology/2020.osact-1.1.

LinkOut - more resources