Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 27:11:1033070.
doi: 10.3389/fpubh.2023.1033070. eCollection 2023.

Development and validation of questionnaire-based machine learning models for predicting all-cause mortality in a representative population of China

Affiliations

Development and validation of questionnaire-based machine learning models for predicting all-cause mortality in a representative population of China

Ziyi Li et al. Front Public Health. .

Abstract

Background: Considering that the previously developed mortality prediction models have limited applications to the Chinese population, a questionnaire-based prediction model is of great importance for its accuracy and convenience in clinical practice.

Methods: Two national cohort, namely, the China Health and Nutrition Survey (8,355 individual older than 18) and the China Health and Retirement Longitudinal Study (12,711 individuals older than 45) were used for model development and validation. One hundred and fifty-nine variables were compiled to generate predictions. The Cox regression model and six machine learning (ML) models were used to predict all-cause mortality. Finally, a simple questionnaire-based ML prediction model was developed using the best algorithm and validated.

Results: In the internal validation set, all the ML models performed better than the traditional Cox model in predicting 6-year mortality and the random survival forest (RSF) model performed best. The questionnaire-based ML model, which only included 20 variables, achieved a C-index of 0.86 (95%CI: 0.80-0.92). On external validation, the simple questionnaire-based model achieved a C-index of 0.82 (95%CI: 0.77-0.87), 0.77 (95%CI: 0.75-0.79), and 0.79 (95%CI: 0.77-0.81), respectively, in predicting 2-, 9-, and 11-year mortality.

Conclusions: In this prospective population-based study, a model based on the RSF analysis performed best among all models. Furthermore, there was no significant difference between the prediction performance of the questionnaire-based ML model, which only included 20 variables, and that of the model with all variables (including laboratory variables). The simple questionnaire-based ML prediction model, which needs to be further explored, is of great importance for its accuracy and suitability to the Chinese general population.

Keywords: machine learning; mortality; personalized prediction; prediction model; questionnaire-based.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Receiver operating characteristic curves of different models for predicting 6-year all-cause mortality in Internal Validation Cohort. (A) Models with all variables. (B) Models with variables excluding laboratory variables. (C) Models with only laboratory variables. COX, Cox proportional hazards regression model; Lasso, least absolute shrinkage and selection operator regression model; GLMBoost, boosted generalized linear model; ST, survival tree model; CIF, conditional inference forest model; RSF, random forest survival analysis model; GBM, gradient boosting model.

Similar articles

References

    1. GBD 2017 Disease and Injury Incidence and Prevalence Collaborators . Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. (2018) 392:1789–858. 10.1016/S0140-6736(18)32279-7 - DOI - PMC - PubMed
    1. Fischer K, Kettunen J, Würtz P, Haller T, Havulinna AS, Kangas AJ, et al. . Biomarker profiling by nuclear magnetic resonance spectroscopy for the prediction of all-cause mortality: an observational study of 17,345 persons. PLoS Med. (2014) 11:e1001606. 10.1371/journal.pmed.1001606 - DOI - PMC - PubMed
    1. Liao J, Muniz-Terrera G, Scholes S, Hao Y, Chen YM. Lifestyle index for mortality prediction using multiple ageing cohorts in the USA, UK and Europe. Sci Rep. (2018) 8:6644. 10.1038/s41598-018-24778-1 - DOI - PMC - PubMed
    1. Suemoto CK, Ueda P, Beltrán-Sánchez H, Lebrão ML, Duarte YA, Wong R, et al. . Development and validation of a 10-year mortality prediction model: meta-analysis of individual participant data from five cohorts of older adults in developed and developing countries. J Gerontol A Biol Sci Med Sci. (2017) 72:410–6. 10.1093/gerona/glw166 - DOI - PMC - PubMed
    1. Ganna A, Ingelsson E. 5-year mortality predictors in 498 103 UK Biobank participants: a prospective population-based study. Lancet. (2015) 386:533–40. 10.1016/S0140-6736(15)60175-1 - DOI - PubMed

Publication types

LinkOut - more resources