Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing

Lionel T E Cheng¹, Jiaping Zheng, Guergana K Savova, Bradley J Erickson

Affiliations

PMID: 19484309
PMCID: PMC2837158
DOI: 10.1007/s10278-009-9215-7

Review

Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing

Lionel T E Cheng et al. J Digit Imaging. 2010 Apr.

. 2010 Apr;23(2):119-32.

doi: 10.1007/s10278-009-9215-7. Epub 2009 May 30.

Authors

Lionel T E Cheng¹, Jiaping Zheng, Guergana K Savova, Bradley J Erickson

Affiliation

¹ Department of Radiology, Mayo Clinic, Rochester, MN 55905, USA.

PMID: 19484309
PMCID: PMC2837158
DOI: 10.1007/s10278-009-9215-7

Abstract

Information in electronic medical records is often in an unstructured free-text format. This format presents challenges for expedient data retrieval and may fail to convey important findings. Natural language processing (NLP) is an emerging technique for rapid and efficient clinical data retrieval. While proven in disease detection, the utility of NLP in discerning disease progression from free-text reports is untested. We aimed to (1) assess whether unstructured radiology reports contained sufficient information for tumor status classification; (2) develop an NLP-based data extraction tool to determine tumor status from unstructured reports; and (3) compare NLP and human tumor status classification outcomes. Consecutive follow-up brain tumor magnetic resonance imaging reports (2000--2007) from a tertiary center were manually annotated using consensus guidelines on tumor status. Reports were randomized to NLP training (70%) or testing (30%) groups. The NLP tool utilized a support vector machines model with statistical and rule-based outcomes. Most reports had sufficient information for tumor status classification, although 0.8% did not describe status despite reference to prior examinations. Tumor size was unreported in 68.7% of documents, while 50.3% lacked data on change magnitude when there was detectable progression or regression. Using retrospective human classification as the gold standard, NLP achieved 80.6% sensitivity and 91.6% specificity for tumor status determination (mean positive predictive value, 82.4%; negative predictive value, 92.0%). In conclusion, most reports contained sufficient information for tumor status determination, though variable features were used to describe status. NLP demonstrated good accuracy for tumor status classification and may have novel application for automated disease status classification from electronic databases.

PubMed Disclaimer

Figures

**Fig 1**
Classification scheme for radiology reports.

**Fig 2**
Development of NLP-based data extraction tool.

**Fig 3**
Simplified illustration of processing and analysis of an example report by the NLP-based data extraction tool.

**Fig 4**
Outcomes of human annotation for classifiable reports.

**Fig 5**
Comparison of NLP and human classification outcomes for reports in test set.

**Fig 6**
Receiver operating characteristic curves for tumor status determination by NLP.

See this image and copyright information in PMC

References

1. Automatic Content Extraction (ACE) Evaluation. Available at http://www.nist.gov/speech/tests/ace/. Accessed 17 Nov 2008
1. Message Understanding Conference (MUC) and Information Extraction. Available at http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html. Accessed 17 Nov 2008
1. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform, pp 128–144, 2008 - PubMed
1. Hripcsak G, Friedman C, Alderson PO, DuMouchel W, Johnson SB, Clayton PD. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med. 1995;122(9):681–688. - PubMed
1. Thomas BJ, Ouellette H, Halpern EF, Rosenthal DI. Automated computer-assisted categorization of radiology reports. AJR Am J Roentgenol. 2005;184(2):687–690. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing

Affiliation

Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical