Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 22;9(1):e199.
doi: 10.1017/cts.2025.10116. eCollection 2025.

Development and validation of natural language processing algorithms in the national ENACT network

Affiliations

Development and validation of natural language processing algorithms in the national ENACT network

Yanshan Wang et al. J Clin Transl Sci. .

Abstract

Objective: Electronic Health Record (EHR) data are critical for advancing translational research and AI technologies. The ENACT network offers access to structured EHR data across 57 CTSA hubs. However, substantial information is contained in clinical narratives, requiring natural language processing (NLP) for research. The ENACT NLP Working Group was formed to make NLP-derived clinical information accessible and queryable across the network.

Methods: We established the ENACT NLP Working Group with 13 sites selected based on criteria including clinical notes access, IT infrastructure, NLP expertise, and institutional support. We divided sites into five focus groups targeting clinical tasks within disease contexts. Each focus group consisted of two development sites and two validation sites. We extended the ENACT ontology to standardize NLP-derived data and conducted multisite evaluations using the Open Health Natural Language Processing (OHNLP) Toolkit.

Results: The working group achieved 100% site retention and deployed NLP infrastructure across all sites. We developed and validated NLP algorithms for rare disease phenotyping, social determinants of health, opioid use disorder, sleep phenotyping, and delirium phenotyping. Performance varied across sites (F1 scores 0.53-0.96), highlighting data heterogeneity impacts. We extended the ENACT common data model and ontology to incorporate NLP-derived data while maintaining Shared Health Research Informatics NEtwork (SHRINE) compatibility.

Conclusion: This demonstrates feasibility of deploying NLP infrastructure across large, federated networks. The focus group approach proved more practical than general-purpose approaches. Key lessons include the challenge of data heterogeneity and importance of collaborative governance. This work also provides a foundation that other networks can build on to implement NLP capabilities for translational research.

Keywords: ENACT; Translational research; electronic health records; natural language processing; network.

PubMed Disclaimer

Conflict of interest statement

No competing interests were declared.

Figures

Figure 1.
Figure 1.
Participating sites in the evolve to next-gen accrual to clinical trials (ENACT) network natural language processing (NLP) working group.
Figure 2.
Figure 2.
An overview of the ENACT NLP workflow. *SHRIN= shared health research information network.

References

    1. Visweswaran S, Becich MJ, D’Itri VS, et al. Accrual to clinical trials (ACT): A Clinical and Translational Science Award Consortium Network. JAMIA Open 2018;1:147–152. - PMC - PubMed
    1. Tang AS, Woldemariam SR, Miramontes S, Norgeot B, Oskotsky TT, Sirota M. Harnessing EHR data for health research. Nat Med 2024;30:1847–1855. - PubMed
    1. Zhang Y, Cai T, Yu S, et al. High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc 2019;14:3426–3444. - PMC - PubMed
    1. Xu D, Wang C, Khan A, et al. Quantitative disease risk scores from EHR with applications to clinical risk stratification and genetic studies. NPJ Digit Med 2021;4:116. - PMC - PubMed
    1. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: Benefits, risks, and strategies for success. NPJ Digit Med 2020;3:17. - PMC - PubMed