Review

. 2016 Jan-Feb;36(1):176-91.

doi: 10.1148/rg.2016150080.

Natural Language Processing Technologies in Radiology Research and Clinical Applications

Tianrun Cai¹, Andreas A Giannopoulos¹, Sheng Yu¹, Tatiana Kelil¹, Beth Ripley¹, Kanako K Kumamaru¹, Frank J Rybicki¹, Dimitrios Mitsouras¹

Affiliations

Affiliation

¹ From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.).

PMID: 26761536
PMCID: PMC4734053
DOI: 10.1148/rg.2016150080

Review

Natural Language Processing Technologies in Radiology Research and Clinical Applications

Tianrun Cai et al. Radiographics. 2016 Jan-Feb.

. 2016 Jan-Feb;36(1):176-91.

doi: 10.1148/rg.2016150080.

Authors

Tianrun Cai¹, Andreas A Giannopoulos¹, Sheng Yu¹, Tatiana Kelil¹, Beth Ripley¹, Kanako K Kumamaru¹, Frank J Rybicki¹, Dimitrios Mitsouras¹

Affiliation

¹ From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.).

PMID: 26761536
PMCID: PMC4734053
DOI: 10.1148/rg.2016150080

Abstract

The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively "mine" these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. "Intelligent" search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications.

©RSNA, 2016.

PubMed Disclaimer

Figures

**Figure 1.**
Chart illustrates how NLP as understood in present-day radiology is a collection of various techniques that aim to extract information from natural language (eg, analyze a radiology report to extract concepts of interest and put them in a structured format) but that also use this output to (for example) index reports in a searchable database, provide patient- or report-level classification, or summarize findings in simpler natural language. CT = computed tomography, *CTPA* = CT pulmonary angiography.

**Figure 2.**
Medical ontology (in this example, Systematized Nomenclature of Medicine–Clinical Terms [SNOMED-CT]) shows a unique concept and its description. SNOMED-CT provides a unique code for the concept (22298006) and its preferred name (myocardial infarction), the Unified Medical Language System (UMLS) concept unique identifier (CUI) and semantic type (disease or symptom), a list of synonyms (eg, cardiac infarction) for this concept, and relationships with other concepts.

**Figure 3.**
Diagram illustrates Text Analysis and Knowledge Extraction (cTAKES), an NLP system designed specifically for extracting information from clinical text. Text from a radiology report when input into cTAKES is analyzed to produce a list of individual concepts identified from a terminology of medical terms (in this example, both SNOMED-CT code and UMLS Metathesaurus CUI). Each concept is also assigned a “polarity” based on whether cTAKES recognizes the finding mentioned as present or absent (eg, no evidence of infarction is assigned a polarity of −1). A degree of certainty is also assigned. In this example, because of the word “probable,” the corresponding concept is coded as uncertain.

**Figure 4.**
Diagram illustrates a pattern matching process designed to extract report dates. A regular expression (upper left) is designed to detect the date in the header of each report stored in our EMR system. Reports have a header that consists of a numeric string (the EMR number) enclosed by the character “|” and followed by a date (upper right). When the pattern matching process encounters a character sequence that matches this pattern, the date is displayed (bottom).

**Figure 5.**
Diagram illustrates the syntactic analysis of the sentence “The gallbladder is surgically absent.” Each word (except “The”) is assigned a part-of-speech designation using grammatical rules. Linguistic NLP systems often perform such analyses to identify sentence subparts that might correspond to specific medical concepts.

**Figure 6.**
A challenge in NLP is that ambiguous terms can be interpreted in more than one way depending on the context in which they are used. For example, this diagram shows how the word “ventricle” can refer to two distinct concepts in the UMLS Metathesaurus terminology. Beyond distinct UMLS CUIs, these particular concepts also have distinct semantic types, broad categories of concepts that are described in the UMLS Semantic Network. Each concept may be assigned to one or more semantic types.

**Figure 7.**
Simplified example of the structured format generated by an NLP system (MedLEE) as a result of processing the text “increased consolidation of the left lower lobe compatible with atelectasis or pneumonia.” MedLEE has been used to extract information from radiology reports for a variety of research and CDS purposes. (Reprinted, with permission, from reference .)

**Figure 8.**
Diagram illustrates how machine learning algorithms are an integral part of linguistic NLP systems. Most important, these algorithms, such as support vector machine *(SVM)* or maximum entropy *(MaxEnt)* models, are used for patient- or report-level classification. They rely on analyzing a set of features used to describe each training example to determine a model that best separates positive (class = 1) from negative (class = −1) examples. Features are typically thought of as vectors whose entries can be as simple as the frequency with which individual words appear in each example, but they can also be based on the structured information extracted from each example using linguistic NLP systems. Following this model training, the trained classifier is applied to a new text of unknown classification by extracting the same features used to train it. *SVC* = superior vena cava.

**Figure 9.**
Chart illustrates a simplified example of the structured format generated by the NILE NLP system, which combines linguistic and clinical knowledge. NILE can identify concepts and recognize the anatomic relationships between location modifiers and these concepts. This information can then be used to classify pulmonary embolism into (for example) central, segmental, or subsegmental categories.

**Figure 10a.**
Retrieval of information regarding tumor progression from unstructured brain MR imaging reports. **(a)** Diagram illustrates the desired classification scheme for extracting structured information regarding disease status, magnitude of change, and significance of change. **(b)** Diagram illustrates how an NLP system is developed for a classification task using machine learning– and/or rule-based methods. *SVM* = support vector machine. (Fig 10 reprinted, with permission, from reference .)

**Figure 10b.**
Retrieval of information regarding tumor progression from unstructured brain MR imaging reports. **(a)** Diagram illustrates the desired classification scheme for extracting structured information regarding disease status, magnitude of change, and significance of change. **(b)** Diagram illustrates how an NLP system is developed for a classification task using machine learning– and/or rule-based methods. *SVM* = support vector machine. (Fig 10 reprinted, with permission, from reference .)

See this image and copyright information in PMC

References

1. National Guideline Clearinghouse . ACR practice guideline for communication of diagnostic imaging findings. Rockville, Md: Agency for Healthcare Research and Quality (AHRQ). http://www.guideline.gov/content.aspx?id=32541. Published 2014. Accessed March 16, 2015.
1. Jha AK, DesRoches CM, Campbell EG, et al. Use of electronic health records in U.S. hospitals. N Engl J Med 2009;360(16):1628–1638. - PubMed
1. Tang PC. Key capabilities of an electronic health record system. Washington, DC: Committee on Data Standards for Patient Safety, Board on Health Care Services, Institute of Medicine, 2003.
1. Travis AR, Sevenster M, Ganesh R, Peters JF, Chang PJ. Preferences for structured reporting of measurement data: an institutional survey of medical oncologists, oncology registrars, and radiologists. Acad Radiol 2014;21(6):785–796. - PubMed
1. Larson DB, Towbin AJ, Pryor RM, Donnelly LF. Improving consistency in radiology reporting through the use of department-wide standardized structured reporting. Radiology 2013;267(1):240–250. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Natural Language Processing Technologies in Radiology Research and Clinical Applications

Affiliation

Natural Language Processing Technologies in Radiology Research and Clinical Applications

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources