A common type system for clinical natural language processing

Stephen T Wu¹, Vinod C Kaggal, Dmitriy Dligach, James J Masanz, Pei Chen, Lee Becker, Wendy W Chapman, Guergana K Savova, Hongfang Liu, Christopher G Chute

Affiliations

PMID: 23286462
PMCID: PMC3575354
DOI: 10.1186/2041-1480-4-1

A common type system for clinical natural language processing

Stephen T Wu et al. J Biomed Semantics. 2013.

. 2013 Jan 3;4(1):1.

doi: 10.1186/2041-1480-4-1.

Authors

Stephen T Wu¹, Vinod C Kaggal, Dmitriy Dligach, James J Masanz, Pei Chen, Lee Becker, Wendy W Chapman, Guergana K Savova, Hongfang Liu, Christopher G Chute

Affiliation

¹ Mayo Clinic, Rochester, Rochester, MN, USA. wu.stephen@mayo.edu.

PMID: 23286462
PMCID: PMC3575354
DOI: 10.1186/2041-1480-4-1

Abstract

Background: One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings.

Results: We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later.

Conclusions: We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.

PubMed Disclaimer

Figures

**Figure 1**
**Types and features for 3 namespaces.** Structured data types, Utility types, and Text span types. Dark gray background coloring indicates types that are not in the namespace but are included to show inheritance. Arrows indicate inheritance.

**Figure 2**
The syntax namespace: types for morphology and syntax.

**Figure 3**
The textsem namespace: spanned types for shallow semantics.

**Figure 4**
The refsem namespace, with deep semantic types and a model of core CEMs.

**Figure 5**
The relation namespace, with both text relations (spanned) and referential semantic (unspanned) relations.

**Figure 6**
**Example results of NER and relation detection.** A shallow semantic representation with named entities and textual relationships. Boxes show instances of types from the common type system associated with the example sentence. For clarity, only relevant features with example-specific values are shown. Small black boxes refer to instances of non-primitive data types; the actual instances for EventMention:ontologyConceptArr.

**Figure 7**
**Example results of deep semantic processing.** A deep semantic representation with coreferring mentions resolved, attributes combined, and a relationship inferred. The relevant SignSymptom:ontologyConcept instances (disambiguated concept identifiers) have been omitted. In this example, we have omitted the line from SignSymptom:mention to EventMention instances since they are implied by the links from EventMention:event to SignSymptom.

See this image and copyright information in PMC

References

1. Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng. 2004;10:327–348. doi: 10.1017/S1351324904003523. - DOI
1. Wu S, Kaggal V, Savova G, Liu H, Dligach D, Zheng J, Chapman W, Chute C. Generality and Reuse in a Common Type System for Clinical Natural Language Processing. Managing Interoperability and Complexity in Health Systems (MIXHS) 2011.
1. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–513. doi: 10.1136/jamia.2009.001560. - DOI - PMC - PubMed
1. Verspoor K, Baumgartner W, Jr, Roeder C, Hunter L. In: From Form to Meaning: Processing Texts Automatically. Chiarcos C, Castilho E, Stede M, editor. Tubingen: Narr; 2009. Abstracting the types away from a UIMA type system.
1. Hahn U, Buyko E, Landefeld R, Mühlhausen M, Poprat M, Tomanek K, Wermter J. An overview of JCoRe, the JULIE lab UIMA component repository. Book An overview of JCoRe, the JULIE lab UIMA component repository, vol. 8. pp. 1–7. 2008. pp. 1–7.

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A common type system for clinical natural language processing

Affiliation

A common type system for clinical natural language processing

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources