An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Sijia Liu¹, Andrew Wen¹, Liwei Wang¹, Huan He¹, Sunyang Fu¹, Robert Miller², Andrew Williams², Daniel Harris³, Ramakanth Kavuluru³, Mei Liu⁴, Noor Abu-El-Rub⁴, Dalton Schutte⁵, Rui Zhang⁵, Masoud Rouhizadeh⁶, John D Osborne⁷, Yongqun He⁸, Umit Topaloglu⁹, Stephanie S Hong¹⁰, Joel H Saltz¹¹, Thomas Schaffter¹², Emily Pfaff¹³, Christopher G Chute¹⁰, Tim Duong¹⁴, Melissa A Haendel¹⁵, Rafael Fuentes¹⁶, Peter Szolovits¹⁷, Hua Xu¹⁸, Hongfang Liu^{1

18}

Affiliations

¹ Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA.
² Tufts Clinical and Translational Science Institute, Tufts Medical Center, Boston, Massachusetts, USA.
³ Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA.
⁴ Department of Internal Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA.
⁵ Department of Pharmaceutical Care & Health Systems, University of Minnesota at Twin Cities, Minneapolis, Minnesota, USA.
⁶ Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, USA.
⁷ Department of Computer Science, University of Alabama at Birmingham, Birmingham, Alabama, USA.
⁸ Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, USA.
⁹ Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA.
¹⁰ Department of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.
¹¹ Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
¹² Sage Bionetwork, Seattle, Washington, USA.
¹³ Department of Medicine, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, USA.
¹⁴ Department of Radiology, Albert Einstein College of Medicine, Bronx, New York, USA.
¹⁵ Center for Health AI, University of Colorado Anschutz Medical Campus, Denver, Colorado, USA.
¹⁶ Alex Informatics, North Bethesda, Maryland, USA.
¹⁷ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
¹⁸ School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.

PMID: 37555837
PMCID: PMC10654844
DOI: 10.1093/jamia/ocad134

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Sijia Liu et al. J Am Med Inform Assoc. 2023.

. 2023 Nov 17;30(12):2036-2040.

doi: 10.1093/jamia/ocad134.

Authors

Affiliations

¹ Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA.
² Tufts Clinical and Translational Science Institute, Tufts Medical Center, Boston, Massachusetts, USA.
³ Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA.
⁴ Department of Internal Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA.
⁵ Department of Pharmaceutical Care & Health Systems, University of Minnesota at Twin Cities, Minneapolis, Minnesota, USA.
⁶ Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, USA.
⁷ Department of Computer Science, University of Alabama at Birmingham, Birmingham, Alabama, USA.
⁸ Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, USA.
⁹ Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA.
¹⁰ Department of Medicine, Johns Hopkins University, Baltimore, Maryland, USA.
¹¹ Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA.
¹² Sage Bionetwork, Seattle, Washington, USA.
¹³ Department of Medicine, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, USA.
¹⁴ Department of Radiology, Albert Einstein College of Medicine, Bronx, New York, USA.
¹⁵ Center for Health AI, University of Colorado Anschutz Medical Campus, Denver, Colorado, USA.
¹⁶ Alex Informatics, North Bethesda, Maryland, USA.
¹⁷ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
¹⁸ School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.

PMID: 37555837
PMCID: PMC10654844
DOI: 10.1093/jamia/ocad134

Abstract

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

Keywords: electronic healthy records; federated learning; multi-institutional data annotation; natural language processing.

PubMed Disclaimer

Conflict of interest statement

MAH has a founding interest in Pryzm Health. HX and The University of Texas Health Science Center at Houston have financial related interests at Melax Technologies Inc.

References

1. Rosenbloom ST, Denny JC, Xu H, et al. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc 2011; 18 (2): 181–6. - PMC - PubMed
1. Blease C, Kaptchuk TJ, Bernstein MH, et al. Artificial intelligence and the future of primary care: exploratory qualitative study of UK General Practitioners' Views. J Med Internet Res 2019; 21 (3): e12802. - PMC - PubMed
1. Fu S, Chen D, He H, et al. Clinical concept extraction: a methodology review. J Biomed Inform 2020; 109: 103526. - PMC - PubMed
1. Haug CJ. From patient to patient–sharing the data from clinical trials. N Engl J Med 2016; 374 (25): 2409–11. - PubMed
1. Kent DM, Leung LY, Zhou Y, et al. Association of silent cerebrovascular disease identified using natural language processing and future ischemic stroke. Neurology 2021; 97 (13): e1313–21. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Affiliations

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Authors

Affiliations

Abstract

Conflict of interest statement

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical