Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives

Glenn T Gobbel¹, Ruth Reeves², Shrimalini Jayaramaraja³, Dario Giuse⁴, Theodore Speroff⁵, Steven H Brown⁶, Peter L Elkin⁷, Michael E Matheny⁸

Affiliations

¹ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: glenn.t.gobbel@vanderbilt.edu.
² Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: ruth.reeves2@va.gov.
³ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: shrimalini.jayaramaraja@vanderbilt.edu.
⁴ Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: dario.giuse@vanderbilt.edu.
⁵ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: ted.speroff@vanderbilt.edu.
⁶ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: steven.brown@va.gov.
⁷ Department of Biomedical Informatics, University at Buffalo, SUNY, Buffalo, NY, USA. Electronic address: ontolimatics@gmail.com.
⁸ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: michael@matheny.info.

PMID: 24316051
DOI: 10.1016/j.jbi.2013.11.008

Free article

Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives

Glenn T Gobbel et al. J Biomed Inform. 2014 Apr.

Free article

. 2014 Apr:48:54-65.

doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.

Authors

Glenn T Gobbel¹, Ruth Reeves², Shrimalini Jayaramaraja³, Dario Giuse⁴, Theodore Speroff⁵, Steven H Brown⁶, Peter L Elkin⁷, Michael E Matheny⁸

Affiliations

¹ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: glenn.t.gobbel@vanderbilt.edu.
² Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: ruth.reeves2@va.gov.
³ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: shrimalini.jayaramaraja@vanderbilt.edu.
⁴ Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: dario.giuse@vanderbilt.edu.
⁵ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: ted.speroff@vanderbilt.edu.
⁶ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: steven.brown@va.gov.
⁷ Department of Biomedical Informatics, University at Buffalo, SUNY, Buffalo, NY, USA. Electronic address: ontolimatics@gmail.com.
⁸ Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA. Electronic address: michael@matheny.info.

PMID: 24316051
DOI: 10.1016/j.jbi.2013.11.008

Abstract

Rapid, automated determination of the mapping of free text phrases to pre-defined concepts could assist in the annotation of clinical notes and increase the speed of natural language processing systems. The aim of this study was to design and evaluate a token-order-specific naïve Bayes-based machine learning system (RapTAT) to predict associations between phrases and concepts. Performance was assessed using a reference standard generated from 2860 VA discharge summaries containing 567,520 phrases that had been mapped to 12,056 distinct Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) concepts by the MCVS natural language processing system. It was also assessed on the manually annotated, 2010 i2b2 challenge data. Performance was established with regard to precision, recall, and F-measure for each of the concepts within the VA documents using bootstrapping. Within that corpus, concepts identified by MCVS were broadly distributed throughout SNOMED CT, and the token-order-specific language model achieved better performance based on precision, recall, and F-measure (0.95±0.15, 0.96±0.16, and 0.95±0.16, respectively; mean±SD) than the bag-of-words based, naïve Bayes model (0.64±0.45, 0.61±0.46, and 0.60±0.45, respectively) that has previously been used for concept mapping. Precision, recall, and F-measure on the i2b2 test set were 92.9%, 85.9%, and 89.2% respectively, using the token-order-specific model. RapTAT required just 7.2ms to map all phrases within a single discharge summary, and mapping rate did not decrease as the number of processed documents increased. The high performance attained by the tool in terms of both accuracy and speed was encouraging, and the mapping rate should be sufficient to support near-real-time, interactive annotation of medical narratives. These results demonstrate the feasibility of rapidly and accurately mapping phrases to a wide range of medical concepts based on a token-order-specific naïve Bayes model and machine learning.

Keywords: Bayesian prediction; CSV; FN; FP; IQV; MCVS; Machine learning; Multi-threaded Clinical Vocabulary Server; NLP; Natural language processing; Opt; Perf; RapTAT; Rapid Text Annotation Tool; SNOMED-CT; SVM; Systematized Nomenclature of Medicine-Clinical Terms; Systematized nomenclature of medicine; TP; UMLS; Unified Medical Language System; comma-separated value; false negative; false positive; index of qualitative variation; natural language processing; optimism; performance; support vector machine; true positive.

Published by Elsevier Inc.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives

Affiliations

Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives

Authors

Affiliations

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous