Multicenter Study

. 2025 Apr 30;17(1):24.

doi: 10.1186/s11689-025-09612-w.

Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models

Levi Kaster¹, Ethan Hillis¹, Inez Y Oh¹, Bhooma R Aravamuthan², Virginia C Lanzotti³, Casey R Vickstrom²; Brain Gene Registry Consortium; Christina A Gurnett², Philip R O Payne¹, Aditi Gupta⁴

Collaborators, Affiliations

Collaborators

Brain Gene Registry Consortium:
M Wasserstein, M Chopra, M Sahin, M Wangler, B Schultz, K Izumi, S Bergner, A Gropman, C Smith-Hicks, L Abbeduto, H Hazlett, D Doherty, K German, L DaWalt, J Neul, J Constantino, D Baldridge, S Srivastava, S Molholm, S Walkley, E Storch, R Samaco, J Cohen, S Shankar, J Piven, S Mahida, A Sveden, K Dies, E R Riggs, J M Savatt, B Minor

Affiliations

¹ Institute for Informatics, Data Science and Biostatistics, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.
² Department of Neurology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.
³ Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.
⁴ Institute for Informatics, Data Science and Biostatistics, Washington University School of Medicine in St. Louis, St. Louis, MO, USA. agupta24@wustl.edu.

PMID: 40307685
PMCID: PMC12042395
DOI: 10.1186/s11689-025-09612-w

Multicenter Study

Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models

Levi Kaster et al. J Neurodev Disord. 2025.

. 2025 Apr 30;17(1):24.

doi: 10.1186/s11689-025-09612-w.

Authors

Collaborators

Brain Gene Registry Consortium:
M Wasserstein, M Chopra, M Sahin, M Wangler, B Schultz, K Izumi, S Bergner, A Gropman, C Smith-Hicks, L Abbeduto, H Hazlett, D Doherty, K German, L DaWalt, J Neul, J Constantino, D Baldridge, S Srivastava, S Molholm, S Walkley, E Storch, R Samaco, J Cohen, S Shankar, J Piven, S Mahida, A Sveden, K Dies, E R Riggs, J M Savatt, B Minor

Affiliations

¹ Institute for Informatics, Data Science and Biostatistics, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.
² Department of Neurology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.
³ Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.
⁴ Institute for Informatics, Data Science and Biostatistics, Washington University School of Medicine in St. Louis, St. Louis, MO, USA. agupta24@wustl.edu.

PMID: 40307685
PMCID: PMC12042395
DOI: 10.1186/s11689-025-09612-w

Abstract

Background: Functional biomarkers in neurodevelopmental disorders, such as verbal and ambulatory abilities, are essential for clinical care and research activities. Treatment planning, intervention monitoring, and identifying comorbid conditions in individuals with intellectual and developmental disabilities (IDDs) rely on standardized assessments of these abilities. However, traditional assessments impose a burden on patients and providers, often leading to longitudinal inconsistencies and inequities due to evolving guidelines and associated time-cost. Therefore, this study aimed to develop an automated approach to classify verbal and ambulatory abilities from EHR data of IDD and cerebral palsy (CP) patients. Application of large language models (LLMs) to clinical notes, which are rich in longitudinal data, may provide a low-burden pipeline for extracting functional biomarkers efficiently and accurately.

Methods: Data from the multi-institutional National Brain Gene Registry (BGR) and a CP clinic cohort were utilized, comprising 3,245 notes from 125 individuals and 5,462 clinical notes from 260 individuals, respectively. Employing three LLMs-GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4 Omni-we provided the models with a clinical note and utilized a detailed conversational format to prompt the models to answer: "Does the individual use any words?" and "Can the individual walk without aid?" These responses were evaluated against ground-truth abilities, which were established using neurobehavioral assessments collected for each dataset.

Results: LLM pipelines demonstrated high accuracy (weighted-F1 scores > .90) in predicting ambulatory ability for both cohorts, likely due to the consistent use of Gross Motor Functional Classification System (GMFCS) as a consistent ground-truth standard. However, verbal ability predictions were more accurate in the BGR cohort, likely due to higher adherence between the prompt and ground-truth assessment questions. While LLMs can be computationally expensive, analysis of our protocol affirmed the cost effectiveness when applied to select notes from the EHR.

Conclusions: LLMs are effective at extracting functional biomarkers from EHR data and broadly generalizable across variable note-taking practices and institutions. Individual verbal and ambulatory ability were accurately extracted, supporting the method's ability to streamline workflows by offering automated, efficient data extraction for patient care and research. Future studies are needed to extend this methodology to additional populations and to demonstrate more granular functional data classification.

Keywords: Electronic health records; Functional biomarkers; Large language models; Neurodevelopmental disorders.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: No participants were recruited specifically for this study. This work constitutes secondary use of data approved by the Washington University in St. Louis IRB (protocols #202010013 [Brain Gene Registry cohort] and #202309003 [cerebral palsy cohort]). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Prompt for large language model analysis. The generative pre-trained transformer (GPT) model was prompted in a conversational format in which GPT’s system prompt is first asserted. The system prompt steers the behavior of the model, allowing for it to be more adaptable to the task. The user (researcher) then asks if GPT understands its role, to which GPT confirms. Finally, the user provides detailed walking and using words definitions and extraction instructions with the desired output format. The clinical note is then included in the prompt at the placeholder symbol “{}”

**Fig. 2**
Illustration of GPT project workflow for both cohorts. The pipelines in the photo are repeated for all versions of GPT utilized: GPT- 3.5, GPT- 4 t, and GPT- 4o

**Fig. 3**
Proportion and correctness of non-unknown GPT note-level BGR predictions

**Fig. 4**
Proportion and correctness of non-unknown GPT note-level CP predictions

See this image and copyright information in PMC

References

1. Biomarkers Definitions Working G. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001;69(3):89–95. 10.1067/mcp.2001.113989. - PubMed
1. Jensen K, Soguero-Ruiz C, Oyvind Mikalsen K, et al. Analysis of free text in electronic health records for identification of cancer patient trajectories. Sci Rep. 2017;7:46226. 10.1038/srep46226. - PMC - PubMed
1. Kho AN, Pacheco JA, Peissig PL, et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med. 2011;3(79):79re1. 10.1126/scitranslmed.3001807. - PMC - PubMed
1. Wei WQ, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc. 2016;23(e1):e20-7. 10.1093/jamia/ocv130. - PMC - PubMed
1. Morris MA, Kho AN. Silence in the EHR: infrequent documentation of aphonia in the electronic health record. Bmc Health Serv Res. 2014;14:Artn 425.10.1186/1472-6963-14-425. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models

Collaborators

Affiliations

Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous