Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2019 Jun;23(2):719-728.
doi: 10.1007/s11325-019-01801-x. Epub 2019 Feb 19.

Interrater agreement between American and Chinese sleep centers according to the 2014 AASM standard

Affiliations
Comparative Study

Interrater agreement between American and Chinese sleep centers according to the 2014 AASM standard

Shujian Deng et al. Sleep Breath. 2019 Jun.

Abstract

Objectives: To determine inter-lab reliability in sleep stage scoring using the 2014 American Academy of Sleep Medicine (AASM) manual. To understand in-depth reasons for disagreement and provide suggestions for improvement.

Methods: This study consisted of 40 all-night polysomnographys (PSGs) from different samples. PSGs were segmented into 37,642 30-s epochs. Five doctors from China and two doctors from America scored the epochs following the 2014 AASM standard. Scoring disagreement between two centers was evaluated using Cohen's kappa (κ). After visual inspection of PSGs of deviating scorings, potential disagreement reasons were analyzed.

Results: Inter-lab reliability yielded a substantial degree (κ = 0.75 ± 0.01). Scoring for stage W (κ = 0.89) and R (κ = 0.87) achieved the highest agreement, while stage N1 (κ = 0.45) reflected the lowest. Considering the relative disagreement ratio, N2-N3 (22.09%), W-N1 (19.68%), and N1-N2 (18.75%) were the most frequent combinations of discrepancy. American and Chinese doctors showed certain characteristics in the scoring of discrepancy combination W-N1, N1-N2, and N2-N3. There are seven reasons for disagreement, namely "on-threshold characteristic" (29.21%), "context influence" (18.06%), "characteristic identification difficulty" (8.81%), "arousal-wake confusion" (7.57%), "derivation inconsistence" (2.15%), "on-borderline characteristic" (0.92%), and "misrecognition" (33.27%).

Conclusions: This study demonstrated the sleep stage scoring agreement of the 2014 AASM manual and explored potential sources of labeling ambiguity. Improvement measures were suggested accordingly to help remove ambiguity for scorers and improve scoring reliability at the international level.

Keywords: AASM manual; Discrepancy; Interrater reliability (IRR); Polysomnography (PSG); Sleep stage scoring.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Sleep. 1998 Nov 1;21(7):749-57 - PubMed
    1. Sleep Med Rev. 2000 Apr;4(2):149-167 - PubMed
    1. J Sleep Res. 2004 Mar;13(1):63-9 - PubMed
    1. Tohoku J Exp Med. 2005 Aug;206(4):353-60 - PubMed
    1. J Clin Sleep Med. 2007 Mar 15;3(2):121-31 - PubMed

Publication types

LinkOut - more resources