Development and validation of questionnaire-based machine learning models for predicting all-cause mortality in a representative population of China
- PMID: 36778549
- PMCID: PMC9911458
- DOI: 10.3389/fpubh.2023.1033070
Development and validation of questionnaire-based machine learning models for predicting all-cause mortality in a representative population of China
Abstract
Background: Considering that the previously developed mortality prediction models have limited applications to the Chinese population, a questionnaire-based prediction model is of great importance for its accuracy and convenience in clinical practice.
Methods: Two national cohort, namely, the China Health and Nutrition Survey (8,355 individual older than 18) and the China Health and Retirement Longitudinal Study (12,711 individuals older than 45) were used for model development and validation. One hundred and fifty-nine variables were compiled to generate predictions. The Cox regression model and six machine learning (ML) models were used to predict all-cause mortality. Finally, a simple questionnaire-based ML prediction model was developed using the best algorithm and validated.
Results: In the internal validation set, all the ML models performed better than the traditional Cox model in predicting 6-year mortality and the random survival forest (RSF) model performed best. The questionnaire-based ML model, which only included 20 variables, achieved a C-index of 0.86 (95%CI: 0.80-0.92). On external validation, the simple questionnaire-based model achieved a C-index of 0.82 (95%CI: 0.77-0.87), 0.77 (95%CI: 0.75-0.79), and 0.79 (95%CI: 0.77-0.81), respectively, in predicting 2-, 9-, and 11-year mortality.
Conclusions: In this prospective population-based study, a model based on the RSF analysis performed best among all models. Furthermore, there was no significant difference between the prediction performance of the questionnaire-based ML model, which only included 20 variables, and that of the model with all variables (including laboratory variables). The simple questionnaire-based ML prediction model, which needs to be further explored, is of great importance for its accuracy and suitability to the Chinese general population.
Keywords: machine learning; mortality; personalized prediction; prediction model; questionnaire-based.
Copyright © 2023 Li, Yang, He, Wang, Ping, Li, Xu, Zhang and Li.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures

Similar articles
-
Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study.PLoS Med. 2018 Nov 6;15(11):e1002674. doi: 10.1371/journal.pmed.1002674. eCollection 2018 Nov. PLoS Med. 2018. PMID: 30399150 Free PMC article.
-
The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study.JMIR Med Inform. 2022 Feb 18;10(2):e33440. doi: 10.2196/33440. JMIR Med Inform. 2022. PMID: 35179504 Free PMC article.
-
Machine learning-based prediction of 1-year mortality for acute coronary syndrome✰.J Cardiol. 2022 Mar;79(3):342-351. doi: 10.1016/j.jjcc.2021.11.006. Epub 2021 Nov 29. J Cardiol. 2022. PMID: 34857429
-
Estimating cardiovascular mortality in patients with hypertension using machine learning: The role of depression classification based on lifestyle and physical activity.J Psychosom Res. 2025 Feb;189:112030. doi: 10.1016/j.jpsychores.2024.112030. Epub 2024 Dec 29. J Psychosom Res. 2025. PMID: 39752763
-
Machine Learning-based Prediction Models for Diagnosis and Prognosis in Inflammatory Bowel Diseases: A Systematic Review.J Crohns Colitis. 2022 Mar 14;16(3):398-413. doi: 10.1093/ecco-jcc/jjab155. J Crohns Colitis. 2022. PMID: 34492100 Free PMC article.
References
-
- GBD 2017 Disease and Injury Incidence and Prevalence Collaborators . Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. (2018) 392:1789–858. 10.1016/S0140-6736(18)32279-7 - DOI - PMC - PubMed
-
- Fischer K, Kettunen J, Würtz P, Haller T, Havulinna AS, Kangas AJ, et al. . Biomarker profiling by nuclear magnetic resonance spectroscopy for the prediction of all-cause mortality: an observational study of 17,345 persons. PLoS Med. (2014) 11:e1001606. 10.1371/journal.pmed.1001606 - DOI - PMC - PubMed
-
- Suemoto CK, Ueda P, Beltrán-Sánchez H, Lebrão ML, Duarte YA, Wong R, et al. . Development and validation of a 10-year mortality prediction model: meta-analysis of individual participant data from five cohorts of older adults in developed and developing countries. J Gerontol A Biol Sci Med Sci. (2017) 72:410–6. 10.1093/gerona/glw166 - DOI - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources