Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 12;28(6):1098-1107.
doi: 10.1093/jamia/ocaa277.

Machine-learning model to predict the cause of death using a stacking ensemble method for observational data

Affiliations

Machine-learning model to predict the cause of death using a stacking ensemble method for observational data

Chungsoo Kim et al. J Am Med Inform Assoc. .

Abstract

Objective: Cause of death is used as an important outcome of clinical research; however, access to cause-of-death data is limited. This study aimed to develop and validate a machine-learning model that predicts the cause of death from the patient's last medical checkup.

Materials and methods: To classify the mortality status and each individual cause of death, we used a stacking ensemble method. The prediction outcomes were all-cause mortality, 8 leading causes of death in South Korea, and other causes. The clinical data of study populations were extracted from the national claims (n = 174 747) and electronic health records (n = 729 065) and were used for model development and external validation. Moreover, we imputed the cause of death from the data of 3 US claims databases (n = 994 518, 995 372, and 407 604, respectively). All databases were formatted to the Observational Medical Outcomes Partnership Common Data Model.

Results: The generalized area under the receiver operating characteristic curve (AUROC) of the model predicting the cause of death within 60 days was 0.9511. Moreover, the AUROC of the external validation was 0.8887. Among the causes of death imputed in the Medicare Supplemental database, 11.32% of deaths were due to malignant neoplastic disease.

Discussion: This study showed the potential of machine-learning models as a new alternative to address the lack of access to cause-of-death data. All processes were disclosed to maintain transparency, and the model was easily applicable to other institutions.

Conclusion: A machine-learning model with competent performance was developed to predict cause of death.

Keywords: cause of death; classification; clinical; decision support systems; machine learning; mortality.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Target population criteria and feature extraction for base learner development. The patient’s index date was set as the date of the last visit to healthcare provider, and the patients with intervals ≥ 1 year from the first visit were extracted. In addition, patients who visited during the last year of the database were excluded from the target population to prevent the bias due to censoring of the records. The outcome was determined to have “occurred” when within a certain time-at-risk interval after the index date. The feature of the patients was collected before the index date, and features within the long-term and short-term prior index date were also collected for the temporality. Abbreviation: NHIS–NSC, National Health Insurance System–National Sample Cohort.
Figure 2.
Figure 2.
The schematic view of the stacking ensemble model architecture. A 2-level stacking ensemble method was used to predict the patient’s cause of death. The stacking model consists of base learners and meta-learner, and the meta-learner uses the prediction results of the base learners as input variables. Base learners that predict each of the survival and 8 causes of death as an outcome of prediction are developed first by applying 2 algorithms, lasso logistic regression and gradient boosting machine. For meta-learner, 18 input variables from base learners are used to make the final prediction. Abbreviations: GBM, gradient boosting machine; LLR, lasso logistic regression.
Figure 3.
Figure 3.
Receiver operating characteristic curve of the final model from development and validation datasets. The receiver operating characteristic (ROC) curve plotted from the cause of death prediction model. The presence of death within 60 days from the last visit date and its cause were predicted. As a meta-learner, Xgboost was used. The ROC curve for each cause of death is shown. The figure shows for the NHIS–NSC’s test set and AUSOM dataset. Abbreviations: AUSOM, Ajou University School of Medicine; NHIS–NSC, National Health Insurance Services–National Sample Cohort.
Figure 4.
Figure 4.
Cause-of-death temporal trend and demographic distribution in the NHIS–NSC and US databases, imputed by the prediction model. Distribution of causes of death according to age group and year. The graph at the top shows that malignant cancer death accounted for the largest proportion, and that this trend was independent of year and age group in the NHIS–NSC. The graph at the bottom shows the distribution of the cause of death imputed from US databases using the developed model. Because the year of each database is different, the graph is limited to the specific year and age group.

Similar articles

Cited by

References

    1. Weiss NS. All-cause mortality as an outcome in epidemiologic studies: proceed with caution. Eur J Epidemiol 2014; 29 (3): 147–9. - PubMed
    1. Black WC, Haggstrom DA, Welch HG.. All-cause mortality in randomized trials of cancer screening. J Natl Cancer Inst 2002; 94 (3): 167–73. - PubMed
    1. Sasieni PD, Wald NJ.. Should a reduction in all-cause mortality be the goal when assessing preventive medical therapies? Circulation 2017; 135 (21): 1985–7. - PubMed
    1. Heijnsdijk EAM, Csanádi M, Gini A, et al.All-cause mortality versus cancer-specific mortality as outcome in cancer screening trials: a review and modeling study. Cancer Med 2019; 8 (13): 6127–38. - PMC - PubMed
    1. Lin JS, Piper MA, Perdue LA, et al.Screening for colorectal cancer: updated evidence report and systematic review for the US preventive services task force. JAMA 2016; 315 (23): 2576–94. - PubMed

Publication types