BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices
- PMID: 38716250
- PMCID: PMC11074401
- DOI: 10.3389/fpubh.2024.1392180
BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices
Abstract
Introduction: Social media platforms serve as a valuable resource for users to share health-related information, aiding in the monitoring of adverse events linked to medications and treatments in drug safety surveillance. However, extracting drug-related adverse events accurately and efficiently from social media poses challenges in both natural language processing research and the pharmacovigilance domain.
Method: Recognizing the lack of detailed implementation and evaluation of Bidirectional Encoder Representations from Transformers (BERT)-based models for drug adverse event extraction on social media, we developed a BERT-based language model tailored to identifying drug adverse events in this context. Our model utilized publicly available labeled adverse event data from the ADE-Corpus-V2. Constructing the BERT-based model involved optimizing key hyperparameters, such as the number of training epochs, batch size, and learning rate. Through ten hold-out evaluations on ADE-Corpus-V2 data and external social media datasets, our model consistently demonstrated high accuracy in drug adverse event detection.
Result: The hold-out evaluations resulted in average F1 scores of 0.8575, 0.9049, and 0.9813 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. External validation using human-labeled adverse event tweets data from SMM4H further substantiated the effectiveness of our model, yielding F1 scores 0.8127, 0.8068, and 0.9790 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively.
Discussion: This study not only showcases the effectiveness of BERT-based language models in accurately identifying drug-related adverse events in the dynamic landscape of social media data, but also addresses the need for the implementation of a comprehensive study design and evaluation. By doing so, we contribute to the advancement of pharmacovigilance practices and methodologies in the context of emerging information sources like social media.
Keywords: adverse event; drug; language model (LM); pharmacovigilance; social media.
Copyright © 2024 Dong, Guo, Liu, Patterson and Hong.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.
Figures













Similar articles
-
Pharmacovigilance in the digital age: gaining insight from social media data.Exp Biol Med (Maywood). 2025 May 27;250:10555. doi: 10.3389/ebm.2025.10555. eCollection 2025. Exp Biol Med (Maywood). 2025. PMID: 40495881 Free PMC article. Review.
-
Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques.Drug Saf. 2023 Aug;46(8):781-795. doi: 10.1007/s40264-023-01323-2. Epub 2023 Jun 17. Drug Saf. 2023. PMID: 37330415 Free PMC article.
-
Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media.J Biomed Inform. 2025 Mar;163:104789. doi: 10.1016/j.jbi.2025.104789. Epub 2025 Feb 7. J Biomed Inform. 2025. PMID: 39923968
-
Algorithmic Identification of Treatment-Emergent Adverse Events From Clinical Notes Using Large Language Models: A Pilot Study in Inflammatory Bowel Disease.Clin Pharmacol Ther. 2024 Jun;115(6):1391-1399. doi: 10.1002/cpt.3226. Epub 2024 Mar 8. Clin Pharmacol Ther. 2024. PMID: 38459719 Free PMC article.
-
Leveraging Natural Language Processing and Machine Learning Methods for Adverse Drug Event Detection in Electronic Health/Medical Records: A Scoping Review.Drug Saf. 2025 Apr;48(4):321-337. doi: 10.1007/s40264-024-01505-6. Epub 2025 Jan 9. Drug Saf. 2025. PMID: 39786481 Free PMC article.
Cited by
-
Developing electronic health records as a source of real-world data for veterinary pharmacoepidemiology.Front Vet Sci. 2025 Apr 1;12:1550468. doi: 10.3389/fvets.2025.1550468. eCollection 2025. Front Vet Sci. 2025. PMID: 40235568 Free PMC article.
-
Role of Artificial Intelligence and Personalized Medicine in Enhancing HIV Management and Treatment Outcomes.Life (Basel). 2025 May 6;15(5):745. doi: 10.3390/life15050745. Life (Basel). 2025. PMID: 40430173 Free PMC article. Review.
-
Developing predictive models for µ opioid receptor binding using machine learning and deep learning techniques.Exp Biol Med (Maywood). 2025 Mar 19;250:10359. doi: 10.3389/ebm.2025.10359. eCollection 2025. Exp Biol Med (Maywood). 2025. PMID: 40177220 Free PMC article.
-
Pharmacovigilance in the digital age: gaining insight from social media data.Exp Biol Med (Maywood). 2025 May 27;250:10555. doi: 10.3389/ebm.2025.10555. eCollection 2025. Exp Biol Med (Maywood). 2025. PMID: 40495881 Free PMC article. Review.
-
Assessment of the Efficiency of a ChatGPT-Based Tool, MyGenAssist, in an Industry Pharmacovigilance Department for Case Documentation: Cross-Over Study.J Med Internet Res. 2025 Mar 10;27:e65651. doi: 10.2196/65651. J Med Internet Res. 2025. PMID: 40063946 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical