Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 21:2021:438-447.
eCollection 2021.

Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Affiliations

Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Hannah Eyre et al. AMIA Annu Symp Proc. .

Abstract

Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety of components that meet common cNLP needs such as context analysis and mapping to standard terminologies. By utilizing spaCy's clear and easy-to-use conventions, medspaCy enables development of custom pipelines that integrate easily with other spaCy-based modules. Our toolkit includes several core components and facilitates rapid development of pipelines for clinical text.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Overview of medspaCy architecture.
Figure 2:
Figure 2:
Example of a text processing pipeline that utilizes medspaCy and other spaCy-based components.
Figure 3:
Figure 3:
Example of visualization using highlight and context arrows.
Figure 4:
Figure 4:
Highlighted section headers delineate template questions with the section body as the responses when processing semi-structured text.

References

    1. Digan W, Ne´ve´ol A, Neuraz A, Wack M, Baudoin D, Burgun A, et al. Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites. Journal of the American Medical Informatics Association. 2021 3;28(3):504–515. - PMC - PubMed
    1. Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering. 2004 9;10(3-4):327–348.
    1. Cunningham H. GATE, a general architecture for text engineering. Computers and the Humanities. 2002;36(2):223–254.
    1. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association. 2010;17(5):507–513. - PMC - PubMed
    1. Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, et al. CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines. Journal of the American Medical Informatics Association. 2018 3;25(3):331–336. - PMC - PubMed

Publication types

LinkOut - more resources