Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;2(7):10.1056/aioa2401221.
doi: 10.1056/aioa2401221. Epub 2025 Jun 26.

Expert-Level Detection of Epilepsy Markers in EEG on Short and Long Timescales

Affiliations

Expert-Level Detection of Epilepsy Markers in EEG on Short and Long Timescales

J Li et al. NEJM AI. 2025 Jul.

Abstract

Background: Epileptiform discharges, or spikes, within electroencephalogram (EEG) recordings are essential for diagnosing epilepsy and localizing seizure origins. Artificial intelligence (AI) offers a promising approach to automating detection, but current models are often hindered by artifact-related false positives and often target either event- or EEG-level classification, thus limiting clinical utility.

Methods: We developed SpikeNet2, a deep-learning model based on a residual network architecture, and enhanced it with hard-negative mining to reduce false positives. Our study analyzed 17,812 EEG recordings from 13,523 patients across multiple institutions, including Massachusetts General Brigham (MGB) hospitals. Data from the Human Epilepsy Project (HEP) and SCORE-AI (SAI) were also included. A total of 32,433 event-level samples, labeled by experts, were used for training and evaluation. Performance was assessed using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), calibration error, and a modified area under the curve (mAUC) metric. The model's generalizability was evaluated using external datasets.

Results: SpikeNet2 demonstrated strong performance in event-level spike detection, achieving an AUROC of 0.973 and an AUPRC of 0.995, with 44% of experts surpassing the model on the MGB dataset. In external validation, the model achieved an AUROC of 0.942 and an AUPRC of 0.948 on the HEP dataset. For EEG-level classification, SpikeNet2 recorded an AUROC of 0.958 and an AUPRC of 0.959 on the MGB dataset, an AUROC of 0.888 and an AUPRC of 0.823 on the HEP dataset, and an AUROC of 0.995 and an AUPRC of 0.991 on the SAI dataset, with 32% of experts outperforming the model. The false-positive rate was reduced to an average of nine spikes per hour.

Conclusions: SpikeNet2 offers expert-level accuracy in both event-level spike detection and EEG-level classification, while significantly reducing false positives. Its dual functionality and robust performance across diverse datasets make it a promising tool for clinical and telemedicine applications, particularly in resource-limited settings. (Funded by the National Institutes of Health and others.).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Data Used in Model Development and Validation.
Note that patients and electroencephalograms may overlap in different training phases during model development, but there was strictly no intersection between training and test sets. EEG denotes electroencephalogram; HEP, Human Epilepsy Project; MGB, Massachusetts General Brigham; and SAI, SCORE–Artificial Intelligence.
Figure 2.
Figure 2.. The Pipeline of Hard-Negative Mining.
EEG denotes electroencephalogram; and IED, interictal epileptiform discharges.
Figure 3.
Figure 3.. Event-Level Spike-Classification Performance of SpikeNet2 Compared with Benchmark Models.
Panel A shows the receiver operating characteristic (ROC) curve, Panel B the precision–recall (PR) curve, and Panel C the calibration curve for the Massachusetts General Brigham (MGB) test dataset, with 16 human raters’ operating points shown for comparison. SpikeNet2 (SN2b) performance is color-coded in green, SpikeNet1 (SN1) in blue, and SpikeNet2 before hard-negative mining (SN2a) in red for comparison. Panel D shows the ROC curve, Panel E the PR curve, and Panel F the calibration curve for SpikeNet2 and comparators on the Human Epilepsy Project external validation dataset. Panel G shows a modified ROC curve and Panel H a zoomed-in modified ROC curve on the MGB control test dataset. Figures in parentheses denote 95% confidence intervals. AUC denotes area under the curve; BS, Brier (calibration) score; EBSN2b, the percentage of experts who outperform SN2b; FP, false positive; FPR, false-positive rate; HEP, Human Epilepsy Project; mAUC, normalized area under the modified receiver operating characteristic curve; MGB, Massachusetts General Brigham; PPV, positive predictive value; SN1, SpikeNet1; SN2a, SpikeNet2 without hard-negative mining; SN2b, SpikeNet2 with hard-negative mining; and TPR, true-positive rate.
Figure 4.
Figure 4.. EEG-Level Spike-Classification Performance of SpikeNet2 Compared with Benchmark Models.
Panel A shows the receiver operating characteristic (ROC) curve and Panel B the precision–recall (PR) curve of SpikeNet2 on the Massachusetts General Brigham test set. Panel C shows the ROC curve, and Panel D shows the PR curve of SpikeNet2 on the Human Epilepsy Project external validation dataset. Panel E shows the ROC curve, and Panel F shows the PR curve of SpikeNet2 and the comparator model (SCORE-AI) on the SCORE-AI external validation dataset, with operating points of 14 human raters shown for comparison. Figures in parentheses denote 95% confidence intervals. AUC denotes area under the curve; EBSN2, the percentage of experts who outperform SN2-EEG; FPR, false-positive rate; HEP, Human Epilepsy Project; MGB, Massachusetts General Brigham; PPV, positive predictive value; SAI, SCORE–Artificial Intelligence; SN2-EEG, SpikeNet2 for electroencephalography-level task; and TPR, true-positive rate.

References

    1. Tatum WO, Rubboli G, Kaplan PW, et al. Clinical utility of EEG in diagnosing and monitoring epilepsy in adults. Clin Neurophysiol 2018;129:1056–1082. DOI: 10.1016/j.clinph.2018.01.019. - DOI - PubMed
    1. Van Donselaar CA, Schimsheimer RJ, Geerts AT, Declerck AC. Value of the electroencephalogram in adult patients with untreated idiopathic first seizures. Arch Neurol 1992;49:231–237. DOI: 10.1001/archneur.1992.00530270045017. - DOI - PubMed
    1. Thijs RD, Surges R, O’Brien TJ, Sander JW. Epilepsy in adults. Lancet 2019;393:689–701. DOI: 10.1016/S0140-6736(18)32596-0 - DOI - PubMed
    1. Kane N, Acharya J, Beniczky S, et al. A revised glossary of terms most commonly used by clinical electroencephalographers and updated proposal for the report format of the EEG findings. Revision 2017. Clin Neurophysiol Pract 2017;2:170–185. DOI: 10.1016/j.cnp.2017.07.002. - DOI - PMC - PubMed
    1. Nascimento FA, Barfuss JD, Jaffe A, Westover MB, Jing J. A quantitative approach to evaluating interictal epileptiform discharges based on interpretable quantitative criteria. Clin Neurophysiol 2022;146:10–17. DOI: 10.1016/j.clinph.2022.10.018. - DOI - PubMed

LinkOut - more resources