Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 29;6(6):e34834.
doi: 10.2196/34834.

Pretrained Transformer Language Models Versus Pretrained Word Embeddings for the Detection of Accurate Health Information on Arabic Social Media: Comparative Study

Affiliations

Pretrained Transformer Language Models Versus Pretrained Word Embeddings for the Detection of Accurate Health Information on Arabic Social Media: Comparative Study

Yahya Albalawi et al. JMIR Form Res. .

Abstract

Background: In recent years, social media has become a major channel for health-related information in Saudi Arabia. Prior health informatics studies have suggested that a large proportion of health-related posts on social media are inaccurate. Given the subject matter and the scale of dissemination of such information, it is important to be able to automatically discriminate between accurate and inaccurate health-related posts in Arabic.

Objective: The first aim of this study is to generate a data set of generic health-related tweets in Arabic, labeled as either accurate or inaccurate health information. The second aim is to leverage this data set to train a state-of-the-art deep learning model for detecting the accuracy of health-related tweets in Arabic. In particular, this study aims to train and compare the performance of multiple deep learning models that use pretrained word embeddings and transformer language models.

Methods: We used 900 health-related tweets from a previously published data set extracted between July 15, 2019, and August 31, 2019. Furthermore, we applied a pretrained model to extract an additional 900 health-related tweets from a second data set collected specifically for this study between March 1, 2019, and April 15, 2019. The 1800 tweets were labeled by 2 physicians as accurate, inaccurate, or unsure. The physicians agreed on 43.3% (779/1800) of tweets, which were thus labeled as accurate or inaccurate. A total of 9 variations of the pretrained transformer language models were then trained and validated on 79.9% (623/779 tweets) of the data set and tested on 20% (156/779 tweets) of the data set. For comparison, we also trained a bidirectional long short-term memory model with 7 different pretrained word embeddings as the input layer on the same data set. The models were compared in terms of their accuracy, precision, recall, F1 score, and macroaverage of the F1 score.

Results: We constructed a data set of labeled tweets, 38% (296/779) of which were labeled as inaccurate health information, and 62% (483/779) of which were labeled as accurate health information. We suggest that this was highly efficacious as we did not include any tweets in which the physician annotators were unsure or in disagreement. Among the investigated deep learning models, the Transformer-based Model for Arabic Language Understanding version 0.2 (AraBERTv0.2)-large model was the most accurate, with an F1 score of 87%, followed by AraBERT version 2-large and AraBERTv0.2-base.

Conclusions: Our results indicate that the pretrained language model AraBERTv0.2 is the best model for classifying tweets as carrying either inaccurate or accurate health information. Future studies should consider applying ensemble learning to combine the best models as it may produce better results.

Keywords: BERT; bidirectional encoder representations from transformers; deep learning; health informatics; health information; infodemiology; language model; machine learning; misinformation; pretrained language models; social media; tweets.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Overview of the process followed in labeling tweets as either accurate or inaccurate [31]. ML: machine learning.
Figure 2
Figure 2
Overview of the process used to train and select machine learning models. BLSTM: bidirectional long short-term memory.

Similar articles

Cited by

References

    1. Ott BL. The age of Twitter: Donald J. Trump and the politics of debasement. Critical Stud Media Commun. 2016 Dec 23;34(1):59–68. doi: 10.1080/15295036.2016.1266686. - DOI
    1. El Tantawi M, Bakhurji E, Al-Ansari A, AlSubaie A, Al Subaie HA, AlAli A. Indicators of adolescents' preference to receive oral health information using social media. Acta Odontol Scand. 2019 Apr;77(3):213–8. doi: 10.1080/00016357.2018.1536803. - DOI - PubMed
    1. Tang Y, Hew KF. Using Twitter for education: beneficial or simply a waste of time? Comput Educ. 2017 Mar;106:97–118. doi: 10.1016/j.compedu.2016.12.004. - DOI
    1. Justinia T, Alyami A, Al-Qahtani S, Bashanfar M, El-Khatib M, Yahya A, Zagzoog F. Social media and the orthopaedic surgeon: a mixed methods study. Acta Inform Med. 2019 Mar;27(1):23–8. doi: 10.5455/aim.2019.27.23-28. http://europepmc.org/abstract/MED/31213739 AIM-27-23 - DOI - PMC - PubMed
    1. Hamasha AA, Alghofaili N, Obaid A, Alhamdan M, Alotaibi A, Aleissa M, Alenazi M, Alshehri F, Geevarghese A. Social media utilization among dental practitioner in Riyadh, Saudi Arabia. Open Dentistry J. 2019 Feb 28;13(1):101–6. doi: 10.2174/1874210601913010101. - DOI