Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun;93(6):547-55.
doi: 10.1038/clpt.2013.47. Epub 2013 Mar 4.

Pharmacovigilance using clinical notes

Affiliations

Pharmacovigilance using clinical notes

P LePendu et al. Clin Pharmacol Ther. 2013 Jun.

Abstract

With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. We present novel methods that annotate the unstructured clinical notes and transform them into a deidentified patient-feature matrix encoded using medical terminologies. We demonstrate the use of the resulting high-throughput data for detecting drug-adverse event associations and adverse events associated with drug-drug interactions. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering of spurious signals by adjusting for potential confounding, and compile prevalence information. We argue that analyzing large volumes of free-text clinical notes enables drug safety surveillance using a yet untapped data source. Such data mining can be used for hypothesis generation and for rapid analysis of suspected adverse event risk.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST

The authors declared no conflict of interest.

Figures

Figure 1
Figure 1
Adjusted odds ratios (ORs) for positive cases in the single drug–adverse event set. Results show some variability by event. The 28 positive cases include the following events: myocardial infarction (mi), rhabdomyolysis (rhabd), cardiovascular fibrosis (cvf), acute renal failure (arf), QT prolongation (qt), urinary bladder cancer (ubc), progressive multifocal leukoencephalopathy (pml), aplastic anemia (aa), and venous thrombosis (vt). Some associations are off the scale, and we indicate the OR in parenthesis above the line (one exception, Natalizumab-pml (232), is not shown at all due to extreme scale: OR: 79.5; 95% CI: 30.8–270.4). We also include the number of exposed patients in parenthesis for each drug–adverse event pair. Typically, a signal occurs when the lower bound of the confidence intervals exceed 1.0; however, this threshold may have different optimal settings on the basis of the event.
Figure 2
Figure 2
Performance of adverse drug reactions and drug–drug interaction detection. Overall performance is measured using areas under the receiver operating characteristic curve (AUCs). (a) The unadjusted (blue) vs. adjusted (red) methods yield AUCs of 75.3 and 80.4% overall. (b) For drug interactions, the adjusted methods (red) reach 81.5% AUC.
Figure 3
Figure 3
Cumulative (unadjusted) odds and exposure plots for 10 positive cases involving US Food and Drug Administration (FDA) intervention. Signals are flagged earlier than official alerts in six of nine cases (troglitazone excluded for lack of sufficient exposure). The solid red line is the odds ratio (OR), and the dotted red lines are the confidence intervals (CIs). The solid blue line is the exposure rate. The shaded area marks the period for which FDA intervention applies (e.g., withdrawal). The point estimate marks the earliest year and OR when the lower bound of the 95% CI is above a threshold of 1.0, i.e., when the unadjusted method would flag the drug for monitoring. As more data accumulate and exposure increases, patterns often converge toward more confident signals. cvf, cardiovascular fibrosis; mi, myocardial infarction; qt, QT prolongation; rhabd, rhabdomyolysis; ubc, urinary bladder cancer.
Figure 4
Figure 4
Assignment of patients to 2 × 2 contingency tables. Patients are assigned to cells a, b, c, and d of a 2 × 2 contingency table (C) on the basis of the patterns shown in parts (A) and (B). In the patterns, indications are abbreviated with “I”, drugs with “D”, and outcomes or events with “E.” A patient exposed to the drug is counted in cells “a” or “b” depending on whether the outcome occurs after the drug exposure, based on temporal ordering of first mentions of the I, D, and E. Other patients (i.e., unexposed) are placed in the bottom row of the 2 × 2 contingency table in cells “c” or “d” depending on whether the outcome occurred in the observation duration after the indication. Therefore, for example, an indication followed by a drug and then an event would go into the “a” cell. An indication followed by no drug mention but having an occurrence of the event would go into cell “c.” For drug–drug interactions, we do not restrict the assignment on the basis of the indications. Therefore, patients with mentions of both drugs (in either order) before an event would go into the “a” cell.
Figure 5
Figure 5
Generation of the patient–feature matrix. The workflow (1) starts by downloading ~5.6 million strings for every term in ontologies from both the Unified Medical Language System (UMLS) and BioPortal, as well as all trigger terms from NegEx and ConText; (2) uses term frequency and syntactic type information (e.g., predominant noun phrases) from MedLine to prune the set of strings into a clean lexicon; (3) applies the lexicon directly against the textual notes using exact string matching; (4) applies NegEx and ConText rules to filter negation and family history contexts; (5) applies UMLS Metathesaurus and BioPortal mappings and semantic type information to normalize terms into concepts that are grouped by drug, disease, device, or procedure; and (6) results finally in the patient–feature matrix. Each row of the matrix represents a single note that is linked to a single patient, and the time stamps of the notes induce a temporal ordering over the entire patient–feature matrix.
Figure 6
Figure 6
Sample annotations. (a) A discharge summary is encoded internally using (b) a highly compressed, numerical representation. The strings in parenthesis are keyed to the first column of numbers and are included merely for illustration purposes. (c) The annotations keep track of relative positional information and are so rich owing to the vast lexicon that if we reconstruct the note, very little of the useful information is lost (notice the section headers). The blank areas in the reconstruction represent terms that are not recognized, and terms highlighted in red denote ones that will not be attributed to the present patient because of contextual cues (e.g., family history and negated findings). CABG, coronary artery bypass graft; COPD, chronic obstructive pulmonary disease; CT, computed tomography.
Figure 7
Figure 7
Two-hop query expansion. The algorithm takes a set of concepts C (solid red) and derives all subconcepts C′ (all red) in each ontology O and then repeats the process only once more for all derived concepts C′ (solid blue) to obtain C′′ (all red and blue). Because concepts are mapped across ontologies, the process traverses simultaneously all ontologies that contain C (and C′), thereby “hopping” across ontologies twice. In this illustration, C′′ captures two more concepts from the adjacent ontology O2 that would have otherwise been missed with a single iteration.

Comment in

References

    1. Classen DC, et al. ‘Global trigger tool’ shows that adverse events in hospitals may be ten times greater than previously measured. Health Aff (Millwood) 2011;30:581–589. - PubMed
    1. Hug BL, Keohane C, Seger DL, Yoon C, Bates DW. The costs of adverse drug events in community hospitals. Jt Comm J Qual Patient Saf. 2012;38:120–126. - PubMed
    1. Bushardt RL, Massey EB, Simpson TW, Ariail JC, Simpson KN. Polypharmacy: misleading, but manageable. Clin Interv Aging. 2008;3:383–389. - PMC - PubMed
    1. Stang PE, et al. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med. 2010;153:600–606. - PubMed
    1. Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2010;2(57):57cm29. - PubMed

Publication types

LinkOut - more resources