Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 17;30(12):2036-2040.
doi: 10.1093/jamia/ocad134.

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Affiliations

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Sijia Liu et al. J Am Med Inform Assoc. .

Abstract

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

Keywords: electronic healthy records; federated learning; multi-institutional data annotation; natural language processing.

PubMed Disclaimer

Conflict of interest statement

MAH has a founding interest in Pryzm Health. HX and The University of Texas Health Science Center at Houston have financial related interests at Melax Technologies Inc.

References

    1. Rosenbloom ST, Denny JC, Xu H, et al. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc 2011; 18 (2): 181–6. - PMC - PubMed
    1. Blease C, Kaptchuk TJ, Bernstein MH, et al. Artificial intelligence and the future of primary care: exploratory qualitative study of UK General Practitioners' Views. J Med Internet Res 2019; 21 (3): e12802. - PMC - PubMed
    1. Fu S, Chen D, He H, et al. Clinical concept extraction: a methodology review. J Biomed Inform 2020; 109: 103526. - PMC - PubMed
    1. Haug CJ. From patient to patient–sharing the data from clinical trials. N Engl J Med 2016; 374 (25): 2409–11. - PubMed
    1. Kent DM, Leung LY, Zhou Y, et al. Association of silent cerebrovascular disease identified using natural language processing and future ischemic stroke. Neurology 2021; 97 (13): e1313–21. - PMC - PubMed

Publication types