Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Apr;66(4):1155-1164.
doi: 10.1111/epi.18272. Epub 2025 Feb 17.

Supervised machine learning compared to large language models for identifying functional seizures from medical records

Affiliations
Comparative Study

Supervised machine learning compared to large language models for identifying functional seizures from medical records

Wesley T Kerr et al. Epilepsia. 2025 Apr.

Abstract

Objective: The Functional Seizures Likelihood Score (FSLS) is a supervised machine learning-based diagnostic score that was developed to differentiate functional seizures (FS) from epileptic seizures (ES). In contrast to this targeted approach, large language models (LLMs) can identify patterns in data for which they were not specifically trained. To evaluate the relative benefits of each approach, we compared the diagnostic performance of the FSLS to two LLMs: ChatGPT and GPT-4.

Methods: In total, 114 anonymized cases were constructed based on patients with documented FS, ES, mixed ES and FS, or physiologic seizure-like events (PSLEs). Text-based data were presented in three sequential prompts to the LLMs, showing the history of present illness (HPI), electroencephalography (EEG) results, and neuroimaging results. We compared the accuracy (number of correct predictions/number of cases) and area under the receiver-operating characteristic (ROC) curves (AUCs) of the LLMs to the FSLS using mixed-effects logistic regression.

Results: The accuracy of FSLS was 74% (95% confidence interval [CI] 65%-82%) and the AUC was 85% (95% CI 77%-92%). GPT-4 was superior to both the FSLS and ChatGPT (p <.001), with an accuracy of 85% (95% CI 77%-91%) and AUC of 87% (95% CI 79%-95%). Cohen's kappa between the FSLS and GPT-4 was 40% (fair). The LLMs provided different predictions on different days when the same note was provided for 33% of patients, and the LLM's self-rated certainty was moderately correlated with this observed variability (Spearman's rho2: 30% [fair, ChatGPT] and 63% [substantial, GPT-4]).

Significance: Both GPT-4 and the FSLS identified a substantial subset of patients with FS based on clinical history. The fair agreement in predictions highlights that the LLMs identified patients differently from the structured score. The inconsistency of the LLMs' predictions across days and incomplete insight into their own consistency was concerning. This comparison highlights both benefits and cautions about how machine learning and artificial intelligence could identify patients with FS in clinical practice.

Keywords: electronic health record; informatics; physiologic seizure‐like events; psychogenic nonepileptic seizures (PNES); sensitivity.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Predictions of the FSLS, ChatGPT, and GPT‐4 in patients with each type of video‐EEG‐based diagnosis. The numbers within the bars reflect the portion of that patient group with each predicted diagnosis. The FSLS never predicted mixed ES + FS (blue) or PSLEs (green), whereas ChatGPT and GPT‐4 commonly predicted mixed ES + FS (blue). (See Tables S2 and S3 for detailed performance statistics.) EEG, electroencephalography; ES, epileptic seizures; FS, functional seizures; FSLS, Functional Seizures Likelihood Score; GPT, Generative Pre‐trained Transformer; PSLE, physiologic seizure‐like event.
FIGURE 2
FIGURE 2
The predictions of the FSLS were correlated with but had only fair to moderate agreement with the predictions of ChatGPT (A, Cohen's kappa 26%) and GPT‐4 (B, Cohen's kappa 42%). Each dot reflects a patient, and colors reflect the ictal video‐EEG monitoring–based gold standard diagnosis. Correct predictions of functional seizures (FS) would be in the top right quadrant. Correct predictions of epileptic seizures (ES) would be in the bottom left. Disagreements between methods are in the top left and bottom right. ChatGPT and GPT‐4 often predicted epilepsy only, so all patients stacked on the left axis were predicted to have epilepsy. EEG, electroencephalography; ES, epileptic seizures; FS, functional seizures; FSLS, Functional Seizures Likelihood Score; PSLE, physiologic seizure‐like events.
FIGURE 3
FIGURE 3
The LLMs provided different answers to the same patient on different days, and they had only poor insight into this uncertainty (Spearman's rho2: 30% [ChatGPT, A] and 63% [GPT‐4, B]). Similar to Figure 2, each dot reflects a patient, and colors reflect the ictal video‐EEG monitoring–based gold standard diagnosis. Perfect insight into this uncertainty would be along the diagonal line, whereas the distance from the diagonal line reflects differences between self‐reported certainty and observed certainty. Due to the high number of patients with high predicted probability of epilepsy, dots off the axis reflect observed predicted probability of epilepsy of 0%. EEG, electroencephalography; LLM, large language model.
FIGURE 4
FIGURE 4
ROC curves for the FSLS, ChatGPT, and GPT‐4. The non‐overlapping nature of these curves suggests that the algorithms made these predictions differently. They were not just a difference in sensitivity threshold. ES, epileptic seizures; FS, functional seizures; FSLS, Functional Seizures Likelihood Score; ROC, receiver‐operating characteristic.

References

    1. Seneviratne U, Low ZM, Low ZX, Hehir A, Paramaswaran S, Foong M, et al. Medical health care utilization cost of patients presenting with psychogenic nonepileptic seizures. Epilepsia. 2019;60(2):349–357. - PubMed
    1. Tan M, Pearce N, Tobias A, Cook MJ, D'Souza WJ. Influence of comorbidity on mortality in patients with epilepsy and psychogenic nonepileptic seizures. Epilepsia. 2023;64(4):1035–1045. - PubMed
    1. Zhang L, Beghi E, Tomson T, Beghi M, Erba G, Chang Z. Mortality in patients with psychogenic non‐epileptic seizures a population‐based cohort study. J Neurol Neurosurg Psychiatry. 2022;93(4):379–385. - PubMed
    1. Nightscales R, McCartney L, Auves C, Tao G, Barnard S, Malpas CB, et al. Mortality in patients with psychogenic nonepileptic seizures. Neurology. 2020;95(6):e643–e652. - PubMed
    1. Kerr WT, Sreenivasan SS, Allas CH, Janio EA, Karimi AH, Dubey I, et al. Title: functional seizures across the adult lifespan: female sex, delay to diagnosis and disability. Seizure. 2021;91:476–483. - PubMed

Publication types