Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques
- PMID: 37330415
- PMCID: PMC10344995
- DOI: 10.1007/s40264-023-01323-2
Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques
Abstract
Introduction: Concerns have been raised over the quality of drug safety information, particularly data completeness, collected through spontaneous reporting systems (SRS), although regulatory agencies routinely use SRS data to guide their pharmacovigilance programs. We expected that collecting additional drug safety information from adverse event (ADE) narratives and incorporating it into the SRS database would improve data completeness.
Objective: The aims of this study were to define the extraction of comprehensive drug safety information from ADE narratives reported through the Korea Adverse Event Reporting System (KAERS) as natural language processing (NLP) tasks and to provide baseline models for the defined tasks.
Methods: This study used ADE narratives and structured drug safety information from individual case safety reports (ICSRs) reported through KAERS between 1 January 2015 and 31 December 2019. We developed the annotation guideline for the extraction of comprehensive drug safety information from ADE narratives based on the International Conference on Harmonisation (ICH) E2B(R3) guideline and manually annotated 3723 ADE narratives. Then, we developed a domain-specific Korean Bidirectional Encoder Representations from Transformers (KAERS-BERT) model using 1.2 million ADE narratives in KAERS and provided baseline models for the task we defined. In addition, we performed an ablation experiment to investigate whether named entity recognition (NER) models were improved when a training dataset contained more diverse ADE narratives.
Results: We defined 21 types of word entities, six types of entity labels, and 49 types of relations to formulate the extraction of comprehensive drug safety information as NLP tasks. We obtained a total of 86,750 entities, 81,828 entity labels, and 45,107 relations from manually annotated ADE narratives. The KAERS-BERT model achieved F1-scores of 83.81 and 76.62% on the NER and sentence extraction tasks, respectively, while outperforming other baseline models on all the NLP tasks we defined except the sentence extraction task. Finally, utilizing the NER model for extracting drug safety information from ADE narratives resulted in an average increase of 3.24% in data completeness for KAERS structured data fields.
Conclusions: We formulated the extraction of comprehensive drug safety information from ADE narratives as NLP tasks and developed the annotated corpus and strong baseline models for the tasks. The annotated corpus and models for extracting comprehensive drug safety information can improve the data quality of an SRS database.
© 2023. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures




Similar articles
-
BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices.Front Public Health. 2024 Apr 23;12:1392180. doi: 10.3389/fpubh.2024.1392180. eCollection 2024. Front Public Health. 2024. PMID: 38716250 Free PMC article.
-
Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).Drug Saf. 2019 Jan;42(1):99-111. doi: 10.1007/s40264-018-0762-z. Drug Saf. 2019. PMID: 30649735 Free PMC article. Review.
-
Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning.JMIR Med Inform. 2018 Nov 26;6(4):e12159. doi: 10.2196/12159. JMIR Med Inform. 2018. PMID: 30478023 Free PMC article.
-
Extraction of Information Related to Drug Safety Surveillance From Electronic Health Record Notes: Joint Modeling of Entities and Relations Using Knowledge-Aware Neural Attentive Models.JMIR Med Inform. 2020 Jul 10;8(7):e18417. doi: 10.2196/18417. JMIR Med Inform. 2020. PMID: 32459650 Free PMC article.
-
Adverse drug event detection using natural language processing: A scoping review of supervised learning methods.PLoS One. 2023 Jan 3;18(1):e0279842. doi: 10.1371/journal.pone.0279842. eCollection 2023. PLoS One. 2023. PMID: 36595517 Free PMC article.
Cited by
-
Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.Yearb Med Inform. 2024 Aug;33(1):241-248. doi: 10.1055/s-0044-1800751. Epub 2025 Apr 8. Yearb Med Inform. 2024. PMID: 40199311 Free PMC article.
-
Post-marketing surveillance of anticancer drugs using natural language processing of electronic medical records.NPJ Digit Med. 2024 Nov 9;7(1):315. doi: 10.1038/s41746-024-01323-1. NPJ Digit Med. 2024. PMID: 39521935 Free PMC article.
-
Drug repurposing for glomerular diseases: an underutilized resource.Nat Rev Nephrol. 2024 Nov;20(11):707-721. doi: 10.1038/s41581-024-00864-8. Epub 2024 Jul 31. Nat Rev Nephrol. 2024. PMID: 39085415 Review.
-
Natural language processing of electronic medical records identifies cardioprotective agents for anthracycline induced cardiotoxicity.Sci Rep. 2025 Feb 24;15(1):6678. doi: 10.1038/s41598-025-91187-6. Sci Rep. 2025. PMID: 39994365 Free PMC article.
References
-
- WHO. The importance of pharmacovigilance. World Health Organization; 2002.
-
- KIDS. Pharmacovigillance—statistics on reported ICSRs. 2022 [cited 2022 6 May]. Available from: https://www.drugsafe.or.kr/iwt/ds/en/report/EgovICSRStatistics.do.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical