Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 24;9(12):e28632.
doi: 10.2196/28632.

Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach

Affiliations

Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach

Daphne Chopard et al. JMIR Med Inform. .

Abstract

Background: Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events.

Objective: This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns.

Methods: We used the Unified Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases-10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not.

Results: The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach.

Conclusions: These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion.

Keywords: classification; deep learning; machine learning; natural language processing.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
A serious adverse event (SAE) reporting form. CTCAE: Common Terminology Criteria for Adverse Events; N/A: Not Applicable.
Figure 2
Figure 2
A serious adverse event report annotated independently by 2 annotators. The annotations are highlighted in yellow.
Figure 3
Figure 3
Metathesaurus browser search results.
Figure 4
Figure 4
Coding of documents against the Unified Medical Language System.
Figure 5
Figure 5
Adverse event identification as a binary classification task. CT: computed tomography.
Figure 6
Figure 6
Identification of potential adverse event mentions. CUI: concept unique identifier.
Figure 7
Figure 7
Observing the patterns of positive and negative modifiers. CRTI: common respiratory tract infection; GI: gastrointestinal; OGD: oesophagogastroduodenoscopy; PR: per rectum; SAE: serious adverse event.
Figure 8
Figure 8
Observing more complex patterns of positive and negative use. Hb: hemoglobin.
Figure 9
Figure 9
Architecture based on Bidirectional Encoder Representations from Transformer (BERT) for classification of adverse events. CLS: classification token; SEP: sequence delimiter token.
Figure 10
Figure 10
Distribution of prediction probabilities for all folds in a cross-validation experiment.
Figure 11
Figure 11
Receiver operating characteristic curve for each fold in a cross-validation experiment.
Figure 12
Figure 12
Precision-recall curve for each fold in a cross-validation experiment.

References

    1. Data Mining at FDA - White Paper. US Food and Drug Administration. 2018. [2021-12-11]. https://www.fda.gov/science-research/data-mining/data-mining-fda-white-p... .
    1. Wong A, Plasek JM, Montecalvo SP, Zhou L. Natural language processing and its implications for the future of medication safety: a narrative review of recent advances and challenges. Pharmacotherapy. 2018 Aug 22;38(8):822–41. doi: 10.1002/phar.2151. - DOI - PubMed
    1. Botsis T, Nguyen MD, Woo EJ, Markatou M, Ball R. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. J Am Med Inform Assoc. 2011;18(5):631–8. doi: 10.1136/amiajnl-2010-000022. http://europepmc.org/abstract/MED/21709163 amiajnl-2010-000022 - DOI - PMC - PubMed
    1. Chee BW, Berlin R, Schatz B. Predicting adverse drug events from personal health messages. AMIA Annu Symp Proc. 2011;2011:217–26. http://europepmc.org/abstract/MED/22195073 - PMC - PubMed
    1. Botsis T, Buttolph T, Nguyen MD, Winiecki S, Woo EJ, Ball R. Vaccine adverse event text mining system for extracting features from vaccine safety reports. J Am Med Inform Assoc. 2012;19(6):1011–8. doi: 10.1136/amiajnl-2012-000881. http://europepmc.org/abstract/MED/22922172 amiajnl-2012-000881 - DOI - PMC - PubMed

LinkOut - more resources