Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 18;10(1):14.
doi: 10.1186/s13326-019-0207-3.

Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES

Affiliations

Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES

Martijn G Kersloot et al. J Biomed Semantics. .

Abstract

Background: Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them.

Methods: An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F1-score was determined for named-entity recognition and attribute relationship detection for the concepts 'lung cancer', 'non-small cell lung cancer', and 'recurrence'. The performance of the four implementations was compared with a two-tailed permutation test.

Results: DIRECT detected lung cancer and non-small cell lung cancer concepts with F1-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F1-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F1-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively.

Conclusion: DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F1-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers.

Keywords: Algorithms; Chart abstraction; Electronic health records; Natural language processing; SNOMED CT.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
A representation of a unqualified relationship between Non-small cell lung cancer and Recurrent (1) and an attribute relationship of Non-small cell lung cancer with Recurrent as Clinical course (2). The attribute relationship is modelled as a SNOMED CT concept definition diagram of recurrent non-small cell lung cancer (a new post-coordinated expression). Purple blocks represent defined concepts, blue blocks represent primitive SNOMED CT concepts and yellow blocks represent attributes. Attribute groups are represented using a white circle, and conjunctions are represented using a black dot
Fig. 2
Fig. 2
A visual representation of the cTAKES AggregatePlaintextFastUMLSProcessor pipeline
Fig. 3
Fig. 3
A visual representation of DIRECT in relation to cTAKES. API: Application programming interface. Blue blocks represent developed components, rounded blocks MLP algorithms
Fig. 4
Fig. 4
Transformation of the syntactic relationships in the sentence ‘This patient is diagnosed with recurrent nonsmall cell lung cancer’ (1) to a SNOMED CT concept (2) and a relationship between detected SNOMED CT concepts (3, Recurrent and Non-small cell lung cancer)
Fig. 5
Fig. 5
Screenshots of DIRECT. 1. Input free text using a text field or text file(s). 2. Selection of SNOMED CT concepts to focus on and top-level concepts to include. 3. Selection of SNOMED CT attribute relationships to focus on. 4. Processing of the free text and results

References

    1. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008:128–44. https://www.ncbi.nlm.nih.gov/pubmed/18660887. - PubMed
    1. Zhou L, Mahoney LM, Shakurova A, Goss F, Chang FY, Bates DW, et al. How many medication orders are entered through free-text in EHRs?--a study on hypoglycemic agents. AMIA Annu Symp Proc AMIA Sym. 2012;2012:1079–1088. - PMC - PubMed
    1. Ford E, Nicholson A, Koeling R, Tate A, Carroll J, Axelrod L, et al. Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text? BMC Med Res Methodol. 2013;13:105. doi: 10.1186/1471-2288-13-105. - DOI - PMC - PubMed
    1. Wells BJ, Chagin KM, Nowacki AS, Kattan MW. Strategies for handling missing data in electronic health record derived data. EGEMS (Washington, DC) 2013;1(3):1035. - PMC - PubMed
    1. Liu H, Wu ST, Li D, Jonnalagadda S, Sohn S, Wagholikar K, et al. Towards a semantic lexicon for clinical natural language processing. AMIA Ann Symp Proc AMIA Symp. 2012;2012:568–576. - PMC - PubMed

Publication types

LinkOut - more resources