Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr:66:103275.
doi: 10.1016/j.ebiom.2021.103275. Epub 2021 Mar 18.

Evaluation of artificial intelligence systems for assisting neurologists with fast and accurate annotations of scalp electroencephalography data

Affiliations

Evaluation of artificial intelligence systems for assisting neurologists with fast and accurate annotations of scalp electroencephalography data

Subhrajit Roy et al. EBioMedicine. 2021 Apr.

Abstract

Background: Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data.

Methods: Scalp EEG data of 365 patients containing 171,745 s ictal and 2,185,864 s interictal samples obtained from clinical monitoring systems were analysed as part of a crowdsourced artificial intelligence (AI) challenge. Participants were tasked to develop an ictal/interictal classifier with high sensitivity and low false alarm rates. We built a challenge platform that prevented participants from downloading or directly accessing the data while allowing crowdsourced model development.

Findings: The automatic detection system achieved tunable sensitivities between 75.00% and 91.60% allowing a reduction in the amount of raw EEG data to be reviewed by a human annotator by factors between 142x, and 22x respectively. The algorithm enables instantaneous reviewer-managed optimization of the balance between sensitivity and the amount of raw EEG data to be reviewed.

Interpretation: This study demonstrates the utility of deep learning for patient-specific seizure detection in EEG data. Furthermore, deep learning in combination with a human reviewer can provide the basis for an assistive data labelling system lowering the time of manual review while maintaining human expert annotation performance.

Funding: IBM employed all IBM Research authors. Temple University employed all Temple University authors. The Icahn School of Medicine at Mount Sinai employed Eren Ahsen. The corresponding authors Stefan Harrer and Gustavo Stolovitzky declare that they had full access to all the data in the study and that they had final responsibility for the decision to submit for publication.

Keywords: Artificial intelligence; Automatic labelling, Crowdsourcing challenges; Deep neural networks; EEG; Epilepsy; Seizure detection.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest SR, IKK and SH are inventors on issued patent US 10,596,377. HY is an inventor on pending patent US 16/670,177. All other authors do report no conflicts of interest.

Figures

Fig. 1
Fig. 1
A block diagram of the high-level architecture of the custom-built challenge platform that depicts data and model flow during challenge operation. In this model-to-data paradigm challenge participants at no point download or access the data directly. Instead they create and submit models to the platform (green solid arrows) which automatically organises training and testing and then provides feedback on model performance to participants (orange dashed arrows). This is fundamentally different to conventional crowdsourced challenge setups. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
All 5 valid final submissions were tested against validation and blind test sets. The plots show the results for (a) evaluation metric (E), (b) sensitivity (S), (c) false alarm rate (FA/24 h and (d) sensitivity (S) plotted as a function of FA/24 h.
Fig. 3
Fig. 3
In order to label 24 h of EEG recordings an unassisted human annotator has to review all 24 h of raw EEG data (top). Using the systems developed in this challenge, the amount of data needing review is the sum of the seizure ground truth (correctly detected true positive actual seizure segments) plus the annotation overhead (incorrectly detected false positive segments). All 4 automatic systems operate at 75% detection sensitivity. A conservative upper bound approximation for the total seizure ground truth duration in a 24 h raw EEG data recording is ~0.2% or ~3 min. The best models achieve a minimum annotation overhead of 7 min which therefore allows to reduce the total amount of raw EEG data to be reviewed by a human annotator from 24 h down to 10 min or less. Note that the duration of seizure ground truth may fluctuate across patients, i.e. a patient might experience longer or more frequent seizure episodes on certain days which impacts the total duration of raw EEG data to be reviewed for that day. The annotation overhead however remains unaffected and will stay at the levels shown in the figure for all patients at all times.
Fig. 4
Fig. 4
An engineering step introducing a hyperparameter which allowed a trade-off between sensitivity and FA rate was included in the submissions of teams Otameshi and Ids_comp. This engineering step was applied to all 5 final submissions 4 of which thereby reached sensitivities of 75% or higher. (a) shows false alarm rates at the 75% detection sensitivity mark for those 4 models. (b) shows the reduction factors of raw EEG data that has to be reviewed by human annotators for each system. Team EpiInsights achieves the highest reduction factor of 142x.
Fig. 5
Fig. 5
FAs per 24 h plotted against detection sensitivity going from 75% sensitivity level to the maximum achievable sensitivity for each algorithm. The TAES metric causes the maximum achievable sensitivity for the model of team Ids_cpmp to stay below 80%.
Fig. 6
Fig. 6
Reduction factors of raw EEG data to be reviewed by a human annotator vs. detection sensitivity going from 75% to maximum achievable sensitivity values for each system. The models from teams Otameshi, EpiInsights and Team SG achieve maximum detection sensitivities of 90.63%, 91.60%, and 91.57%, respectively and two-order of magnitude data reduction factors.
Fig. 7
Fig. 7
(a) Applying the engineering step introduced by teams Otameshi and Ids_cpmp raises the maximum detection sensitivities to 90.63%, 91.60% and 91.57%, respectively. This comes at the cost of increased false alarm rates and decreased data reduction factors which are shown in (b). Note that even at maximum sensitivity level the lowest data reduction factor (22, Team SG) still allows to compress 24 h of raw EEG data down to a ~1h-short segment of raw EEG data to be reviewed by a human annotator.

Comment in

References

    1. Miotto R., Wang F., Wang S., Jiang X., Dudley J.T. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2018;19(6):1236–1246. - PMC - PubMed
    1. Esteva A., Kuprel B., Novoa R.A., Ko J., Swetter S.M., Blau H.M., Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–118. - PMC - PubMed
    1. Zhu W., Xie L., Han J., Guo X. The application of deep learning in cancer prognosis prediction. Cancers. 2020;12(3):603. (Basel)Mar 5. - PMC - PubMed
    1. Tomašev N., Glorot X., Rae J.W. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature. 2019;572:116–119. - PMC - PubMed
    1. Gulshan V., Peng L., Coram M. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402–2410. - PubMed