Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 2:13:e58457.
doi: 10.2196/58457.

The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis

Affiliations

The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis

Amadeo Jesus Wals Zurita et al. JMIR Med Inform. .

Abstract

Background: In this study, we evaluate the accuracy, efficiency, and cost-effectiveness of large language models in extracting and structuring information from free-text clinical reports, particularly in identifying and classifying patient comorbidities within oncology electronic health records. We specifically compare the performance of gpt-3.5-turbo-1106 and gpt-4-1106-preview models against that of specialized human evaluators.

Objective: We specifically compare the performance of gpt-3.5-turbo-1106 and gpt-4-1106-preview models against that of specialized human evaluators.

Methods: We implemented a script using the OpenAI application programming interface to extract structured information in JavaScript object notation format from comorbidities reported in 250 personal history reports. These reports were manually reviewed in batches of 50 by 5 specialists in radiation oncology. We compared the results using metrics such as sensitivity, specificity, precision, accuracy, F-value, κ index, and the McNemar test, in addition to examining the common causes of errors in both humans and generative pretrained transformer (GPT) models.

Results: The GPT-3.5 model exhibited slightly lower performance compared to physicians across all metrics, though the differences were not statistically significant (McNemar test, P=.79). GPT-4 demonstrated clear superiority in several key metrics (McNemar test, P<.001). Notably, it achieved a sensitivity of 96.8%, compared to 88.2% for GPT-3.5 and 88.8% for physicians. However, physicians marginally outperformed GPT-4 in precision (97.7% vs 96.8%). GPT-4 showed greater consistency, replicating the exact same results in 76% of the reports across 10 repeated analyses, compared to 59% for GPT-3.5, indicating more stable and reliable performance. Physicians were more likely to miss explicit comorbidities, while the GPT models more frequently inferred nonexplicit comorbidities, sometimes correctly, though this also resulted in more false positives.

Conclusions: This study demonstrates that, with well-designed prompts, the large language models examined can match or even surpass medical specialists in extracting information from complex clinical reports. Their superior efficiency in time and costs, along with easy integration with databases, makes them a valuable tool for large-scale data mining and real-world evidence generation.

Keywords: ChatGPT; EHR; LLMs; data mining; electronic health record; large language models; oncology; radiotherapy.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Representative diagram of the Web Oncological Information System (SIOW). It illustrates the integration of data from MOSAIQ and TPS into the MongoDB database and its subsequent management through SIOW, including the collection of administrative data from the Users Data Base (BDU) and clinical data from the electronic health record system DIRAYA. JSON: JavaScript object notation; RT: radiotherapy.
Figure 2
Figure 2
Flowchart of the study design. COPD: chronic obstructive pulmonary disease; HBP: high blood pressure.
Figure 3
Figure 3
Statistical metrics comparison between 3 evaluators (Physicians, GPT-3.5, and GPT-4) for individual comorbidities and overall totals. Asymmetric error bars indicate the 95% confidence interval. GPT: generative pre-trained transformer. HBP: hypertension or high blood pressure; COPD: chronic obstructive pulmonary disease.
Figure 4
Figure 4
The number of reports for each model, in which at least the number of differences indicated on the x-axis was obtained in the 10 analyses.

Similar articles

Cited by

References

    1. Liu F, Panagiotakos D. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol. 2022 Nov 05;22(1):287. doi: 10.1186/s12874-022-01768-6. https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-... 10.1186/s12874-022-01768-6 - DOI - DOI - PMC - PubMed
    1. Yim WW, Yetisgen M, Harris WP, Kwan SW. Natural Language Processing in Oncology: A Review. JAMA Oncol. 2016 Jun 01;2(6):797–804. doi: 10.1001/jamaoncol.2016.0213.2517402 - DOI - PubMed
    1. Savova GK, Danciu I, Alamudun F, Miller T, Lin C, Bitterman DS, Tourassi G, Warner JL. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Res. 2019 Nov 01;79(21):5463–5470. doi: 10.1158/0008-5472.CAN-19-0579. https://europepmc.org/abstract/MED/31395609 0008-5472.CAN-19-0579 - DOI - PMC - PubMed
    1. Adamson B, Waskom M, Blarre A, Kelly J, Krismer K, Nemeth S, Gippetti J, Ritten J, Harrison K, Ho G, Linzmayer R, Bansal T, Wilkinson S, Amster G, Estola E, Benedum CM, Fidyk E, Estévez Melissa, Shapiro W, Cohen AB. Approach to machine learning for extraction of real-world data variables from electronic health records. Front Pharmacol. 2023;14:1180962. doi: 10.3389/fphar.2023.1180962. https://europepmc.org/abstract/MED/37781703 1180962 - DOI - PMC - PubMed
    1. Waskom ML, Tan K, Wiberg H, Cohen AB, Wittmershaus B, Shapiro W. A hybrid approach to scalable real-world data curation by machine learning and human experts. medRxiv. 2023 doi: 10.1101/2023.03.06.23286770. - DOI

LinkOut - more resources