Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 12;18(4):e0282622.
doi: 10.1371/journal.pone.0282622. eCollection 2023.

Use of machine learning to identify risk factors for insomnia

Affiliations

Use of machine learning to identify risk factors for insomnia

Alexander A Huang et al. PLoS One. .

Abstract

Importance: Sleep is critical to a person's physical and mental health, but there are few studies systematically assessing risk factors for sleep disorders.

Objective: The objective of this study was to identify risk factors for a sleep disorder through machine-learning and assess this methodology.

Design, setting, and participants: A retrospective, cross-sectional cohort study using the publicly available National Health and Nutrition Examination Survey (NHANES) was conducted in patients who completed the demographic, dietary, exercise, and mental health questionnaire and had laboratory and physical exam data.

Methods: A physician diagnosis of insomnia was the outcome of this study. Univariate logistic models, with insomnia as the outcome, were used to identify covariates that were associated with insomnia. Covariates that had a p<0.0001 on univariate analysis were included within the final machine-learning model. The machine learning model XGBoost was used due to its prevalence within the literature as well as its increased predictive accuracy in healthcare prediction. Model covariates were ranked according to the cover statistic to identify risk factors for insomnia. Shapely Additive Explanations (SHAP) were utilized to visualize the relationship between these potential risk factors and insomnia.

Results: Of the 7,929 patients that met the inclusion criteria in this study, 4,055 (51% were female, 3,874 (49%) were male. The mean age was 49.2 (SD = 18.4), with 2,885 (36%) White patients, 2,144 (27%) Black patients, 1,639 (21%) Hispanic patients, and 1,261 (16%) patients of another race. The machine learning model had 64 out of a total of 684 features that were found to be significant on univariate analysis (P<0.0001 used). These were fitted into the XGBoost model and an AUROC = 0.87, Sensitivity = 0.77, Specificity = 0.77 were observed. The top four highest ranked features by cover, a measure of the percentage contribution of the covariate to the overall model prediction, were the Patient Health Questionnaire depression survey (PHQ-9) (Cover = 31.1%), age (Cover = 7.54%), physician recommendation of exercise (Cover = 3.86%), weight (Cover = 2.99%), and waist circumference (Cover = 2.70%).

Conclusion: Machine learning models can effectively predict risk for a sleep disorder using demographic, laboratory, physical exam, and lifestyle covariates and identify key risk factors.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Demographic information and disease characteristics.
Descriptive statistics for demographic characteristics and all covariates within the machine learning model, stratified by whether patients had a sleep disorder. Covariates with SMQ or MCQ labeled in front of it were asked the question written; responses were numeric (integer number) for SMQ and binary (yes, no) for MCQ. Abbreviations: DR = Doctor.
Fig 2
Fig 2. Comparison of different machine learning models.
Comparison of four machine learning models (XGBoost, Random Forest, Artificial Neural Network, Adaptive Boosting) using the model statistics computed from the 20% test set: Accuracy, F1, Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, and AUROC.
Fig 3
Fig 3. Receiver operator characteristic curve and model statistics.
The Receiver operating characteristic curve for the machine-learning model predicting a sleep disorder. AUROC = 0.87.
Fig 4
Fig 4. Model gain statistics.
The Gain, Cover, and Frequency of all covariates within the XGBoost model. The Gain represents the relative contribution of the feature to the model and is the most important metric of model importance within this study. Covariates ordered according to the Gain statistic.
Fig 5
Fig 5. Overall SHAP explanations.
SHAP explanations, purple color representing higher values of the covariate while yellow representing lower values of the covariate. X-axis is the change in log-odds for a sleep disorder. Covariates ordered according to the Gain statistic. Covariates with SMQ or MCQ labeled in front of it were asked the question written; responses were numeric (integer number) for SMQ and binary (yes, no) for MCQ. Abbreviations: DR = Doctor.
Fig 6
Fig 6. SHAP explanations for the Top 4 continuous covariates.
SHAP explanations, covariate value on the x-axis, change in log-odds on the y-axis, red line represents the relationship between the covariate and log-odds for CAD, each black dot represents an observation. Covariates: top left–PHQ-9, top right–Body weight, bottom left–patient age, bottom right–waist circumference.
Fig 7
Fig 7
a: Covariates of interest to evaluate sensibility of the model. SHAP explanations for the relationship between Alcohol and odds of a sleep disorder. Covariate value on the x-axis, change in log-odds on the y-axis, red line represents the relationship between the covariate and log-odds for a sleep disorder, each black dot represents an observation. b: SHAP explanations for the relationship between Caffeine intake and odds of a sleep disorder. Covariate value on the x-axis, change in log-odds on the y-axis, red line represents the relationship between the covariate and log-odds for a sleep disorder, each black dot represents an observation.

Similar articles

Cited by

References

    1. Wang J, Ren X. Association Between Sleep Duration and Sleep Disorder Data from the National Health and Nutrition Examination Survey and Stroke Among Adults in the United States. Med Sci Monit. 2022;28:e936384. Epub 20220703. doi: 10.12659/MSM.936384 ; PubMed Central PMCID: PMC9261468. - DOI - PMC - PubMed
    1. Wolf C, Wolf S, Weiss M, Nino G. Children’s Environmental Health in the Digital Era: Understanding Early Screen Exposure as a Preventable Risk Factor for Obesity and Sleep Disorders. Children (Basel). 2018;5(2). Epub 20180223. doi: 10.3390/children5020031 ; PubMed Central PMCID: PMC5836000. - DOI - PMC - PubMed
    1. Guo Q, Xie W, Peng R, Ma Y, Chong F, Wang Y, et al.. A Dose-Response Relationship Between Sleep Duration and Stroke According to Nonhealth Status in Central China: A Population-based Epidemiology Survey. J Stroke Cerebrovasc Dis. 2019;28(7):1841–52. Epub 20190507. doi: 10.1016/j.jstrokecerebrovasdis.2019.04.016 . - DOI - PubMed
    1. Piepoli MF. Editor’s Presentation Benefit of healthy lifestyle on cardiovascular risk factor control: Focus on body weight, exercise and sleep quality. Eur J Prev Cardiol. 2019;26(12):1235–8. doi: 10.1177/2047487319861847 . - DOI - PubMed
    1. Carriedo-Diez B, Tosoratto-Venturi JL, Canton-Manzano C, Wanden-Berghe C, Sanz-Valero J. The Effects of the Exogenous Melatonin on Shift Work Sleep Disorder in Health Personnel: A Systematic Review. Int J Environ Res Public Health. 2022;19(16). Epub 20220817. doi: 10.3390/ijerph191610199 ; PubMed Central PMCID: PMC9408537. - DOI - PMC - PubMed