Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 23:12:1392180.
doi: 10.3389/fpubh.2024.1392180. eCollection 2024.

BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices

Affiliations

BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices

Fan Dong et al. Front Public Health. .

Abstract

Introduction: Social media platforms serve as a valuable resource for users to share health-related information, aiding in the monitoring of adverse events linked to medications and treatments in drug safety surveillance. However, extracting drug-related adverse events accurately and efficiently from social media poses challenges in both natural language processing research and the pharmacovigilance domain.

Method: Recognizing the lack of detailed implementation and evaluation of Bidirectional Encoder Representations from Transformers (BERT)-based models for drug adverse event extraction on social media, we developed a BERT-based language model tailored to identifying drug adverse events in this context. Our model utilized publicly available labeled adverse event data from the ADE-Corpus-V2. Constructing the BERT-based model involved optimizing key hyperparameters, such as the number of training epochs, batch size, and learning rate. Through ten hold-out evaluations on ADE-Corpus-V2 data and external social media datasets, our model consistently demonstrated high accuracy in drug adverse event detection.

Result: The hold-out evaluations resulted in average F1 scores of 0.8575, 0.9049, and 0.9813 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. External validation using human-labeled adverse event tweets data from SMM4H further substantiated the effectiveness of our model, yielding F1 scores 0.8127, 0.8068, and 0.9790 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively.

Discussion: This study not only showcases the effectiveness of BERT-based language models in accurately identifying drug-related adverse events in the dynamic landscape of social media data, but also addresses the need for the implementation of a comprehensive study design and evaluation. By doing so, we contribute to the advancement of pharmacovigilance practices and methodologies in the context of emerging information sources like social media.

Keywords: adverse event; drug; language model (LM); pharmacovigilance; social media.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Figures

Figure 1
Figure 1
Study design for BERT-based adverse events extraction from social media data.
Figure 2
Figure 2
Distribution of text lengths for ADE-Corpus-V2 dataset.
Figure 3
Figure 3
Overview of human-annotated adverse event tweets from the SMM4H public dataset.
Figure 4
Figure 4
Workflow of the BERT-based adverse event extraction model.
Figure 5
Figure 5
Training loss vs. epochs for fine-tuning the BERT-based adverse event extraction model hyperparameter epochs.
Figure 6
Figure 6
Macro F1, precision, Rrecall score vs. epochs for fine-tuning the BERT-based adverse event extraction model hyperparameter epochs.
Figure 7
Figure 7
Macro F1, precision, recall score vs. batch size for fine-tuning the BERT-based adverse event extraction model hyperparameter batch size.
Figure 8
Figure 8
Macro average F1-score vs. batch sizes for fine-tuning the BERT-based adverse event extraction model hyperparameter batch size.
Figure 9
Figure 9
Macro F1, precision, recall score vs. learning rate for the BERT-based adverse event extraction model hyperparameter learning rate.
Figure 10
Figure 10
Macro average F1-score vs. learning rate for fine-tuning the BERT-based adverse event extraction model hyperparameter learning rate.
Figure 11
Figure 11
Performance metrics of the BERT-based adverse event extraction model from 10 times holdout internal evaluations on ADE_Corpus_V2 dataset.
Figure 12
Figure 12
Performance metrics of the BERT-based adverse event extraction model from external evaluation on SMM4H dataset.
Figure 13
Figure 13
Confusion matrix of the external evaluation on SMM4H dataset for the BERT-based adverse event extraction model.

Similar articles

Cited by

References

    1. Patel C. Pharmacovigilance: a worldwide master key for drug safety monitoring: some additional information. J Young Pharm. (2011) 3:168–9. doi: 10.4103/0975-1483.80310, PMID: - DOI - PMC - PubMed
    1. de Bie S, Ferrajolo C, Straus SM, Verhamme KM, Bonhoeffer J, Wong IC, et al. . Pediatric drug safety surveillance in FDA-AERS: a description of adverse events from GRiP project. PLoS One. (2015) 10:e0130399. doi: 10.1371/journal.pone.0130399, PMID: - DOI - PMC - PubMed
    1. Guo W, Pan B, Sakkiah S, Ji Z, Yavas G, Lu Y, et al. . Informing selection of drugs for COVID-19 treatment through adverse events analysis. Sci Rep. (2021) 11:14022. doi: 10.1038/s41598-021-93500-5, PMID: - DOI - PMC - PubMed
    1. Algarni MA. Evaluating post-market adverse events of the new hepatitis C therapies using FEARS data. Healthcare (Basel). (2022) 10:1400. doi: 10.3390/healthcare10081400, PMID: - DOI - PMC - PubMed
    1. Freifeld CC, Brownstein JS, Menone CM, Bao W, Filice R, Kass-Hout T, et al. . Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Saf. (2014); 37: 343–350. doi: 10.1007/s40264-014-0155-x. PMID: ; PMCID: - DOI - PMC - PubMed

Publication types

MeSH terms