Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov;22(6):1196-204.
doi: 10.1093/jamia/ocv102. Epub 2015 Jul 31.

A method for systematic discovery of adverse drug events from clinical notes

Affiliations

A method for systematic discovery of adverse drug events from clinical notes

Guan Wang et al. J Am Med Inform Assoc. 2015 Nov.

Abstract

Objective: Adverse drug events (ADEs) are undesired harmful effects resulting from use of a medication, and occur in 30% of hospitalized patients. The authors have developed a data-mining method for systematic, automated detection of ADEs from electronic medical records.

Materials and methods: This method uses the text from 9.5 million clinical notes, along with prior knowledge of drug usages and known ADEs, as inputs. These inputs are further processed into statistics used by a discriminative classifier which outputs the probability that a given drug-disorder pair represents a valid ADE association. Putative ADEs identified by the classifier are further filtered for positive support in 2 independent, complementary data sources. The authors evaluate this method by assessing support for the predictions in other curated data sources, including a manually curated, time-indexed reference standard of label change events.

Results: This method uses a classifier that achieves an area under the curve of 0.94 on a held out test set. The classifier is used on 2,362,950 possible drug-disorder pairs comprised of 1602 unique drugs and 1475 unique disorders for which we had data, resulting in 240 high-confidence, well-supported drug-AE associations. Eighty-seven of them (36%) are supported in at least one of the resources that have information that was not available to the classifier.

Conclusion: This method demonstrates the feasibility of systematic post-marketing surveillance for ADEs using electronic medical records, a key component of the learning healthcare system.

Keywords: EMR mining; machine learning; pharmacovigilance; post market drug safety surveillance.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Overview of methods and results. For each of the 2 362 950 possible drug–disorder pairs, we calculated 9 features from the free text of clinical notes in STRIDE, 8 features from known AEs in Medi-Span, and 12 features from known usages in Medi-Span and Drugbank. Based on these features, a Random Forest classifier was trained on the gold standard dataset to recognize the drug–AE relationships. Then, we applied the trained classifiers to the 2 362 950 possible drug–disorder pairs and filtered for support in FAERS and MEDLINE, yielding a set of 240 well supported, high confidence ADEs. Drug–AE pairs used in training are censored.
Figure 2:
Figure 2:
Drug–drug and disorder–disorder similarity using known ADEs. We represent known drug–AEs as a matrix where the rows are drug names and columns are disorders, and the (i, j)-th entry is a binary indicator for whether or not the drug in the i-th row causes the disorder in the j-th column. In this way, each drug is represented as a binary vector. For a given query drug and adverse event (e.g., aspirin and hypersensitivity in panel a), we find other drugs that are known to be associated with hypersensitivity and calculate similarities between aspirin and those drugs. We summarize the similarities with 2 scalar values—the max and mean similarity.
Figure 3:
Figure 3:
Training a classifier to recognize drug–ADE relationships. Positive examples collected from known ADEs in Medi-Span and negative examples created through randomly sampling a drug and disorder with roughly the same co-mention distribution as the positive examples. For each drug–disorder pair in the gold standard, we used 9 features to characterize the pattern of drug and disorder mentions in 9.5 million clinical notes from STRIDE, 8 features to characterize the domain knowledge of drug, disorder, and known ADEs from Medi-Span, and 12 features to characterize the domain knowledge of drug, disorder, and known usages from Medi-Span and Drugbank. The gold standard dataset was randomly split into 70% for training and 30% for testing the classifier.
Figure 4:
Figure 4:
Support from independent and complementary data sources. We validated the predicted drug–AE associations from three independent and complementary data sources. From the 240 drug–ADE associations, 76 occurred in the set of the ADEs with moderate support in Medi-Span up to 2012; 10 occurred in the recent established ADEs included in the additional Medi-Span data from 2012 to 2015; 2 occurred in the reference standard provided by Harpaz, R. et al. Overall, 87 of them (36%) were supported in at least one of the resources that have information that was not available to the classifier.

References

    1. Classen DC, Pestotnik SL, Evans RS, Lloyd JF, Burke JP. Adverse drug events in hospitalized patients. Excess length of stay, extra costs, and attributable mortality. JAMA. 1997;277(4):301–306. - PubMed
    1. Classen DC, Resar R, Griffin F, et al. ‘Global trigger tool' shows that adverse events in hospitals may be ten times greater than previously measured. Health Aff. 2011;30(4):581–589. - PubMed
    1. Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA. 1998;279(15):1200–1205. - PubMed
    1. Tatonetti NP, Ye PP, Daneshjou R, Altman RB. Data-driven prediction of drug effects and interactions. Sci Transl Med. 2012;4(125):125ra31. - PMC - PubMed
    1. Lependu P, Iyer SV, Bauer-Mehren A, et al. Pharmacovigilance using clinical notes. Clin Pharmacol Therapeutics. 2013;93(6):547–555. - PMC - PubMed

Publication types

MeSH terms