Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Jan-Feb;36(1):176-91.
doi: 10.1148/rg.2016150080.

Natural Language Processing Technologies in Radiology Research and Clinical Applications

Affiliations
Review

Natural Language Processing Technologies in Radiology Research and Clinical Applications

Tianrun Cai et al. Radiographics. 2016 Jan-Feb.

Abstract

The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively "mine" these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. "Intelligent" search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Chart illustrates how NLP as understood in present-day radiology is a collection of various techniques that aim to extract information from natural language (eg, analyze a radiology report to extract concepts of interest and put them in a structured format) but that also use this output to (for example) index reports in a searchable database, provide patient- or report-level classification, or summarize findings in simpler natural language. CT = computed tomography, CTPA = CT pulmonary angiography.
Figure 2.
Figure 2.
Medical ontology (in this example, Systematized Nomenclature of Medicine–Clinical Terms [SNOMED-CT]) shows a unique concept and its description. SNOMED-CT provides a unique code for the concept (22298006) and its preferred name (myocardial infarction), the Unified Medical Language System (UMLS) concept unique identifier (CUI) and semantic type (disease or symptom), a list of synonyms (eg, cardiac infarction) for this concept, and relationships with other concepts.
Figure 3.
Figure 3.
Diagram illustrates Text Analysis and Knowledge Extraction (cTAKES), an NLP system designed specifically for extracting information from clinical text. Text from a radiology report when input into cTAKES is analyzed to produce a list of individual concepts identified from a terminology of medical terms (in this example, both SNOMED-CT code and UMLS Metathesaurus CUI). Each concept is also assigned a “polarity” based on whether cTAKES recognizes the finding mentioned as present or absent (eg, no evidence of infarction is assigned a polarity of −1). A degree of certainty is also assigned. In this example, because of the word “probable,” the corresponding concept is coded as uncertain.
Figure 4.
Figure 4.
Diagram illustrates a pattern matching process designed to extract report dates. A regular expression (upper left) is designed to detect the date in the header of each report stored in our EMR system. Reports have a header that consists of a numeric string (the EMR number) enclosed by the character “|” and followed by a date (upper right). When the pattern matching process encounters a character sequence that matches this pattern, the date is displayed (bottom).
Figure 5.
Figure 5.
Diagram illustrates the syntactic analysis of the sentence “The gallbladder is surgically absent.” Each word (except “The”) is assigned a part-of-speech designation using grammatical rules. Linguistic NLP systems often perform such analyses to identify sentence subparts that might correspond to specific medical concepts.
Figure 6.
Figure 6.
A challenge in NLP is that ambiguous terms can be interpreted in more than one way depending on the context in which they are used. For example, this diagram shows how the word “ventricle” can refer to two distinct concepts in the UMLS Metathesaurus terminology. Beyond distinct UMLS CUIs, these particular concepts also have distinct semantic types, broad categories of concepts that are described in the UMLS Semantic Network. Each concept may be assigned to one or more semantic types.
Figure 7.
Figure 7.
Simplified example of the structured format generated by an NLP system (MedLEE) as a result of processing the text “increased consolidation of the left lower lobe compatible with atelectasis or pneumonia.” MedLEE has been used to extract information from radiology reports for a variety of research and CDS purposes. (Reprinted, with permission, from reference .)
Figure 8.
Figure 8.
Diagram illustrates how machine learning algorithms are an integral part of linguistic NLP systems. Most important, these algorithms, such as support vector machine (SVM) or maximum entropy (MaxEnt) models, are used for patient- or report-level classification. They rely on analyzing a set of features used to describe each training example to determine a model that best separates positive (class = 1) from negative (class = −1) examples. Features are typically thought of as vectors whose entries can be as simple as the frequency with which individual words appear in each example, but they can also be based on the structured information extracted from each example using linguistic NLP systems. Following this model training, the trained classifier is applied to a new text of unknown classification by extracting the same features used to train it. SVC = superior vena cava.
Figure 9.
Figure 9.
Chart illustrates a simplified example of the structured format generated by the NILE NLP system, which combines linguistic and clinical knowledge. NILE can identify concepts and recognize the anatomic relationships between location modifiers and these concepts. This information can then be used to classify pulmonary embolism into (for example) central, segmental, or subsegmental categories.
Figure 10a.
Figure 10a.
Retrieval of information regarding tumor progression from unstructured brain MR imaging reports. (a) Diagram illustrates the desired classification scheme for extracting structured information regarding disease status, magnitude of change, and significance of change. (b) Diagram illustrates how an NLP system is developed for a classification task using machine learning– and/or rule-based methods. SVM = support vector machine. (Fig 10 reprinted, with permission, from reference .)
Figure 10b.
Figure 10b.
Retrieval of information regarding tumor progression from unstructured brain MR imaging reports. (a) Diagram illustrates the desired classification scheme for extracting structured information regarding disease status, magnitude of change, and significance of change. (b) Diagram illustrates how an NLP system is developed for a classification task using machine learning– and/or rule-based methods. SVM = support vector machine. (Fig 10 reprinted, with permission, from reference .)

References

    1. National Guideline Clearinghouse . ACR practice guideline for communication of diagnostic imaging findings. Rockville, Md: Agency for Healthcare Research and Quality (AHRQ). http://www.guideline.gov/content.aspx?id=32541. Published 2014. Accessed March 16, 2015.
    1. Jha AK, DesRoches CM, Campbell EG, et al. Use of electronic health records in U.S. hospitals. N Engl J Med 2009;360(16):1628–1638. - PubMed
    1. Tang PC. Key capabilities of an electronic health record system. Washington, DC: Committee on Data Standards for Patient Safety, Board on Health Care Services, Institute of Medicine, 2003.
    1. Travis AR, Sevenster M, Ganesh R, Peters JF, Chang PJ. Preferences for structured reporting of measurement data: an institutional survey of medical oncologists, oncology registrars, and radiologists. Acad Radiol 2014;21(6):785–796. - PubMed
    1. Larson DB, Towbin AJ, Pryor RM, Donnelly LF. Improving consistency in radiology reporting through the use of department-wide standardized structured reporting. Radiology 2013;267(1):240–250. - PubMed

Publication types

MeSH terms