Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012:2012:568-76.
Epub 2012 Nov 3.

Towards a semantic lexicon for clinical natural language processing

Affiliations

Towards a semantic lexicon for clinical natural language processing

Hongfang Liu et al. AMIA Annu Symp Proc. 2012.

Abstract

A semantic lexicon which associates words and phrases in text to concepts is critical for extracting and encoding clinical information in free text and therefore achieving semantic interoperability between structured and unstructured data in Electronic Health Records (EHRs). Directly using existing standard terminologies may have limited coverage with respect to concepts and their corresponding mentions in text. In this paper, we analyze how tokens and phrases in a large corpus distribute and how well the UMLS captures the semantics. A corpus-driven semantic lexicon, MedLex, has been constructed where the semantics is based on the UMLS assisted with variants mined and usage information gathered from clinical text. The detailed corpus analysis of tokens, chunks, and concept mentions shows the UMLS is an invaluable source for natural language processing. Increasing the semantic coverage of tokens provides a good foundation in capturing clinical information comprehensively. The study also yields some insights in developing practical NLP systems.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Histogram of different token groups.
Figure 2.
Figure 2.
Histogram of different token groups.

References

    1. Boguraev B, Pustejovsky J. Corpus processing for lexical acquisition. MIT press; 1996.
    1. Friedman C, Liu H, Shagina L, Johnson S, Hripcsak G. Evaluating the UMLS as a source of lexical knowledge for medical language processing. Proc AMIA Symp; 2001. pp. 189–93. - PMC - PubMed
    1. Johnson SB. A semantic lexicon for medical language processing. Journal of the American Medical Informatics Association. 1999;6(3):205–18. - PMC - PubMed
    1. Kalfa VC, Jia HP, Kunkle RA, McCray PB, Jr, Tack BF, Brogden KA. Congeners of SMAP29 kill ovine pathogens and induce ultrastructural damage in bacterial cells. Antimicrob Agents Chemother. 2001 Nov;45(11):3256–61. - PMC - PubMed
    1. Hettne KM, van Mulligen EM, Schuemie MJ, Schijvenaars BJ, Kors JA. Rewriting and suppressing UMLS terms for improved biomedical term identification. J Biomed Semantics. 2010;1(1):5. - PMC - PubMed

Publication types

LinkOut - more resources