Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 1;40(10):zsx139.
doi: 10.1093/sleep/zsx139.

Large-Scale Automated Sleep Staging

Affiliations

Large-Scale Automated Sleep Staging

Haoqi Sun et al. Sleep. .

Abstract

Study objectives: Automated sleep staging has been previously limited by a combination of clinical and physiological heterogeneity. Both factors are in principle addressable with large data sets that enable robust calibration. However, the impact of sample size remains uncertain. The objectives are to investigate the extent to which machine learning methods can approximate the performance of human scorers when supplied with sufficient training cases and to investigate how staging performance depends on the number of training patients, contextual information, model complexity, and imbalance between sleep stage proportions.

Methods: A total of 102 features were extracted from six electroencephalography (EEG) channels in routine polysomnography. Two thousand nights were partitioned into equal (n = 1000) training and testing sets for validation. We used epoch-by-epoch Cohen's kappa statistics to measure the agreement between classifier output and human scorer according to American Academy of Sleep Medicine scoring criteria.

Results: Epoch-by-epoch Cohen's kappa improved with increasing training EEG recordings until saturation occurred (n = ~300). The kappa value was further improved by accounting for contextual (temporal) information, increasing model complexity, and adjusting the model training procedure to account for the imbalance of stage proportions. The final kappa on the testing set was 0.68. Testing on more EEG recordings leads to kappa estimates with lower variance.

Conclusion: Training with a large data set enables automated sleep staging that compares favorably with human scorers. Because testing was performed on a large and heterogeneous data set, the performance estimate has low variance and is likely to generalize broadly.

Keywords: EEG; big data; machine learning; sleep stages.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Cohen’s kappa for different numbers of training patients and the fixed 1000 testing patients. The testing Cohen’s kappa was computed using the confusion matrix of all epochs from all 1000 testing patients. The mean and standard deviation of the kappa values from five repetitions are displayed. The ** or * markers indicate adjacent cases that exhibit a statistically significant improvement in the kappa statistic, based on Mann-Whitney U test (<0.01 or <0.05, respectively). (B) Cohen’s kappa values after HMM smoothing, which will be described in the next subsection. HMM = hidden Markov model.
Figure 2
Figure 2
The confusion matrix of the fixed 1000 testing patients when trained with (A) 10 patients and 2000 hidden nodes; (B) 300 patients and 2000 hidden nodes; (C) 1000 patients and 2000 hidden nodes; (D) 1000 patients and 20,000 hidden nodes; (E) 1000 patients, 2000 hidden nodes and weighted training samples; and (F) 1000 patients, 20,000 hidden nodes, weighted training samples and smoothing. For each confusion matrix, the repetition with kappa value closest to the mean kappa value over five repetitions is shown. Values are given as percentages, which sum to 100 across rows (human scoring). The color is white for 100% and black for 0%.
Figure 3
Figure 3
Hypnograms before and after HMM smoothing for the patient with largest post-smoothing improvement in the Cohen’s kappa. HMM = hidden Markov model.
Figure 4
Figure 4
Hypnograms before and after HMM smoothing for the patient with largest post-smoothing decline in the Cohen’s kappa. HMM = hidden Markov model.
Figure 5
Figure 5
(A) Training and testing performance with different values of the regularization parameter C; (B) with different numbers of hidden nodes L. The mean and standard deviation of kappa values over five repetitions are shown in both panels
Figure 6
Figure 6
The histogram of the Cohen’s kappa of each testing patient. The dashed line at 0.684 indicates the overall Cohen’s kappa for all epochs pooled from all testing patients.
Figure 7
Figure 7
The testing Cohen’s kappa for different scorers. The numbers in the x-axis labels are number of patients scored by the scorer. Kruskal-Wallis H test followed by post hoc Dunn’s test suggests that the kappa values for S2 is higher than S1 and S3. Other scorers have similar kappa values. ** p-value < .01.
Figure 8
Figure 8
The precision of Cohen’s kappa statistics for subsets of 1000 testing patients with different sizes. The red dashed line is the training Cohen’s kappa. The blue solid line in the middle of the shading is the mean value of testing Cohen’s kappa from four randomly selected nonoverlapping patient subsets. Shading indicates the area of ±standard deviation.

References

    1. Iber C. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology And Technical Specifications. American Academy of Sleep Medicine; 2007. IL.
    1. Rechtschaffen A, Kales A.. A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects. National Institutes of Health Publication no. 204; 1968.
    1. Danker-Hopfe H, Anderer P, Zeitlhofer J et al. Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. J Sleep Res. 2009; 18(1): 74–84. - PubMed
    1. Magalang UJ, Chen NH, Cistulli PA et al. ; SAGIC Investigators Agreement in the scoring of respiratory events and sleep among international sleep centers. Sleep. 2013; 36(4): 591–596. - PMC - PubMed
    1. Schaltenbrand N, Lengelle R, Toussaint M et al. Sleep stage scoring using the neural network model: comparison between visual and automatic analysis in normal subjects and patients. Sleep. 1996; 19(1): 26–35. - PubMed