BERT based natural language processing for triage of adverse drug reaction reports shows close to human-level performance

Erik Bergman¹, Luise Dürlich^{1

2

3}, Veronica Arthurson¹, Anders Sundström¹, Maria Larsson¹, Shamima Bhuiyan¹, Andreas Jakobsson⁴, Gabriel Westman^{1

5}

Affiliations

¹ Swedish Medical Products Agency, Uppsala, Sweden.
² Department of Computer Science, RISE Research Institutes of Sweden, Kista, Sweden.
³ Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden.
⁴ Centre for Mathematical Sciences, Lund University, Lund, Sweden.
⁵ Department of Medical Sciences, Uppsala University, Uppsala, Sweden.

PMID: 38055685
PMCID: PMC10699587
DOI: 10.1371/journal.pdig.0000409

BERT based natural language processing for triage of adverse drug reaction reports shows close to human-level performance

Erik Bergman et al. PLOS Digit Health. 2023.

. 2023 Dec 6;2(12):e0000409.

doi: 10.1371/journal.pdig.0000409. eCollection 2023 Dec.

Authors

Erik Bergman¹, Luise Dürlich^{1

2

3}, Veronica Arthurson¹, Anders Sundström¹, Maria Larsson¹, Shamima Bhuiyan¹, Andreas Jakobsson⁴, Gabriel Westman^{1

5}

Affiliations

¹ Swedish Medical Products Agency, Uppsala, Sweden.
² Department of Computer Science, RISE Research Institutes of Sweden, Kista, Sweden.
³ Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden.
⁴ Centre for Mathematical Sciences, Lund University, Lund, Sweden.
⁵ Department of Medical Sciences, Uppsala University, Uppsala, Sweden.

PMID: 38055685
PMCID: PMC10699587
DOI: 10.1371/journal.pdig.0000409

Abstract

Post-marketing reports of suspected adverse drug reactions are important for establishing the safety profile of a medicinal product. However, a high influx of reports poses a challenge for regulatory authorities as a delay in identification of previously unknown adverse drug reactions can potentially be harmful to patients. In this study, we use natural language processing (NLP) to predict whether a report is of serious nature based solely on the free-text fields and adverse event terms in the report, potentially allowing reports mislabelled at time of reporting to be detected and prioritized for assessment. We consider four different NLP models at various levels of complexity, bootstrap their train-validation data split to eliminate random effects in the performance estimates and conduct prospective testing to avoid the risk of data leakage. Using a Swedish BERT based language model, continued language pre-training and final classification training, we achieve close to human-level performance in this task. Model architectures based on less complex technical foundation such as bag-of-words approaches and LSTM neural networks trained with random initiation of weights appear to perform less well, likely due to the lack of robustness that a base of general language training provides.

Copyright: © 2023 Bergman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Current dataset quarterly distribution with information on seriousness.**
Data as available at time of database lock.

**Fig 2. Overview of the data flow and model architectures investigated.**

**Fig 3. Density plots of F1 scores for all models (left) and F1 difference between models using BERT and AER-BERT (right).**

**Fig 4. Receiver operating characteristics (left) and Precision-Recall curve (right) for the four model architectures on the prospective test set.**

**Fig 5. Results from hold-out sample with two human assessors and the four models.**
We show predictions on the serious (left) and non-serious reports (right) separately for each class, where the leftmost column in the prediction heatmap always corresponds to the database annotation.

See this image and copyright information in PMC

References

1. EMA. ICH E2A Clinical safety data management: definitions and standards for expedited reporting—Scientific guideline. In: European Medicines Agency [Internet]. 17 Sep 2018. [cited 8 Sep 2023]. Available from: https://www.ema.europa.eu/en/ich-e2a-clinical-safety-data-management-def....
1. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. pp. 4171–4186. doi: 10.18653/v1/N19-1423 - DOI
1. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language Models are Few-Shot Learners. arXiv; 2020. doi: 10.48550/arXiv.2005.14165 - DOI
1. Nielsen D. ScandEval: A Benchmark for Scandinavian Natural Language Processing. Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). Tórshavn, Faroe Islands: University of Tartu Library; 2023. pp. 185–201. Available from: https://aclanthology.org/2023.nodalida-1.20.
1. OpenAI. GPT-4 Technical Report. arXiv; 2023. doi: 10.48550/arXiv.2303.08774 - DOI

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BERT based natural language processing for triage of adverse drug reaction reports shows close to human-level performance

Affiliations

BERT based natural language processing for triage of adverse drug reaction reports shows close to human-level performance

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources