Comparative Study

. 2015 Jan;22(1):121-31.

doi: 10.1136/amiajnl-2014-002902. Epub 2014 Oct 21.

Functional evaluation of out-of-the-box text-mining tools for data-mining tasks

Kenneth Jung¹, Paea LePendu², Srinivasan Iyer², Anna Bauer-Mehren², Bethany Percha¹, Nigam H Shah²

Affiliations

¹ Program in Biomedical Informatics, Stanford University, Stanford, California, USA.
² Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.

PMID: 25336595
PMCID: PMC4433377
DOI: 10.1136/amiajnl-2014-002902

Comparative Study

Functional evaluation of out-of-the-box text-mining tools for data-mining tasks

Kenneth Jung et al. J Am Med Inform Assoc. 2015 Jan.

. 2015 Jan;22(1):121-31.

doi: 10.1136/amiajnl-2014-002902. Epub 2014 Oct 21.

Authors

Kenneth Jung¹, Paea LePendu², Srinivasan Iyer², Anna Bauer-Mehren², Bethany Percha¹, Nigam H Shah²

Affiliations

¹ Program in Biomedical Informatics, Stanford University, Stanford, California, USA.
² Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.

PMID: 25336595
PMCID: PMC4433377
DOI: 10.1136/amiajnl-2014-002902

Abstract

Objective: The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug-drug interactions, and learning used-to-treat relationships between drugs and indications.

Materials: We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks.

Results: There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets.

Conclusions: For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice.

Keywords: electronic health records; natural language processing; text mining.

PubMed Disclaimer

Figures

**Figure 1:**
Our investigation has three parts. (A) First, we benchmark the accuracy of the NCBO Annotator-based workflow, REVEAL, and cTAKES on the task of finding mentions of co-morbidities in the 2008 i2b2 Obesity Challenge dataset (details in figure 2). (B) Second, we evaluate the trade-off of using annotations, and the resulting patient-feature matrix, from 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) generated using the NCBO Annotator-based workflow and the REVEAL natural language processing (NLP) system. The three research tasks are: detection of used-to-treat relationships between drugs and indications, detection of drug–drug interactions, and profiling the safety of cilostazol use in patients with peripheral artery disease (PAD). Each of these evaluations is based on previously published work; the only source of variation is the annotations used as input to the published methods. The patient-feature matrix is described in detail in online supplemental materials S2. We did not run cTAKES on the 9 million clinical notes from STRIDE because it would have required over a year to complete given our computational resources. (C) Finally, we explore the impact of dataset size on the task of detecting the used-to-treat relationship using increasingly smaller subsets of the data (details in figure 6).

**Figure 2:**
The precision and recall of the NCBO Annotator, REVEAL, and cTAKES in the 2008 i2b2 dataset is plotted here for each indication. There is considerable variation in recall across both systems and indications, but generally indications that are hard to detect are hard to detect for all systems (eg, Gallstones, labeled here as GS). There is no universally best system across all indications with respect to either precision or recall. ASA, asthma; CAD, coronary artery disease; CHF, congestive heart failure; DM, diabetes; DEPR, depression; GS, gallstones; GERD, gastro-esophageal reflux disease; GT, gout; HCL, hypercholesterolemia; HTN, hypertension; HTG, hypertriglyceridemia; OA, osteoarthritis; OBS, obesity; OSA, obstructive sleep apnea; PVD, peripheral vascular disease.

**Figure 3:**
Profile of adverse events in peripheral artery disease patients with and without exposure to cilostazol. The plot shows ORs and 95% CIs calculated using annotations from the Stanford Translational Research Integrated Database Environment (STRIDE) using the NCBO Annotator-based workflow and REVEAL. There is no change in the conclusions of this analysis depending on the text processing system being used. Note that REVEAL did not find any instances of ‘sudden cardiac death’ in the data; for this event, we set the OR to 1. MACE, major adverse cardiac event.

**Figure 4:**
Detection of adverse drug–drug interactions. The analysis of Iyer *et al.* was carried out using either the NCBO Annotator-based workflow or REVEAL to process clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE). (A) There is no significant difference in the receiver-operator characteristic (ROC) curves (p = 0.275 by DeLong's test) for the two systems. (B) Area under the ROC curves (AUC) for each of nine adverse events separately. There is no significant difference in performance for any adverse event (p>0.05).

**Figure 5:**
Detecting used-to-treat relationships. We carried out the analysis described in Jung *et al.* using either the NCBO Annotator or REVEAL to annotate clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE). There is no significant difference in performance between classifiers trained and tested using features derived from either (p = 0.29 by McNemar's test). PPV, positive predictive value.

**Figure 6:**
Learning curves for the used-to-treat task. We sampled random subsets of patients and used the associated notes to generate features based on the annotations of those notes by either the NCBO Annotator or REVEAL. This was repeated 10 times for each fraction of the full Stanford Translational Research Integrated Database Environment (STRIDE) dataset. The mean performance metric across the 10 runs is plotted, along with the SEM. REVEAL has higher sensitivity in smaller datasets, and generally has higher precision/positive predictive value (PPV).

See this image and copyright information in PMC

References

1. Harpaz R, Haerian K, Chase HS, et al. Mining electronic health records for adverse drug effects using regression based methods. Proceedings of the 1st ACM International Health Informatics Symposium Arlington, Virginia, USA: 1883008: ACM; 2010:100–7 http://dl.acm.org/citation.cfm?id=1883008
1. Haerian K, Varn D, Vaidya S, et al. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clin Pharmacol Ther 2012;92:228–34. - PMC - PubMed
1. Wang X, Hripcsak G, Markatou M, et al. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc 2009;16:328–37. - PMC - PubMed
1. Friedman C. Discovering novel adverse drug events using natural language processing and mining of the electronic health record. AMIE 2009: Proceedings of the 12th Conference on Artificial Intelligence in Medicine 2009:1–5.
1. Liu M, McPeek Hinz ER, Matheny ME, et al. Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records. J Am Med Inform Assoc 2013;20:420–6. - PMC - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Functional evaluation of out-of-the-box text-mining tools for data-mining tasks

Affiliations

Functional evaluation of out-of-the-box text-mining tools for data-mining tasks

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources