Machine-learning model to predict the cause of death using a stacking ensemble method for observational data
- PMID: 33211841
- PMCID: PMC8200274
- DOI: 10.1093/jamia/ocaa277
Machine-learning model to predict the cause of death using a stacking ensemble method for observational data
Abstract
Objective: Cause of death is used as an important outcome of clinical research; however, access to cause-of-death data is limited. This study aimed to develop and validate a machine-learning model that predicts the cause of death from the patient's last medical checkup.
Materials and methods: To classify the mortality status and each individual cause of death, we used a stacking ensemble method. The prediction outcomes were all-cause mortality, 8 leading causes of death in South Korea, and other causes. The clinical data of study populations were extracted from the national claims (n = 174 747) and electronic health records (n = 729 065) and were used for model development and external validation. Moreover, we imputed the cause of death from the data of 3 US claims databases (n = 994 518, 995 372, and 407 604, respectively). All databases were formatted to the Observational Medical Outcomes Partnership Common Data Model.
Results: The generalized area under the receiver operating characteristic curve (AUROC) of the model predicting the cause of death within 60 days was 0.9511. Moreover, the AUROC of the external validation was 0.8887. Among the causes of death imputed in the Medicare Supplemental database, 11.32% of deaths were due to malignant neoplastic disease.
Discussion: This study showed the potential of machine-learning models as a new alternative to address the lack of access to cause-of-death data. All processes were disclosed to maintain transparency, and the model was easily applicable to other institutions.
Conclusion: A machine-learning model with competent performance was developed to predict cause of death.
Keywords: cause of death; classification; clinical; decision support systems; machine learning; mortality.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Figures




Similar articles
-
Application of machine learning approaches to administrative claims data to predict clinical outcomes in medical and surgical patient populations.PLoS One. 2021 Jun 3;16(6):e0252585. doi: 10.1371/journal.pone.0252585. eCollection 2021. PLoS One. 2021. PMID: 34081720 Free PMC article.
-
Dynamic ElecTronic hEalth reCord deTection (DETECT) of individuals at risk of a first episode of psychosis: a case-control development and validation study.Lancet Digit Health. 2020 May;2(5):e229-e239. doi: 10.1016/S2589-7500(20)30024-8. Epub 2020 Mar 26. Lancet Digit Health. 2020. PMID: 33328055
-
Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.Acad Emerg Med. 2016 Mar;23(3):269-78. doi: 10.1111/acem.12876. Epub 2016 Feb 13. Acad Emerg Med. 2016. PMID: 26679719 Free PMC article.
-
Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU.BMJ Open. 2018 Jan 26;8(1):e017833. doi: 10.1136/bmjopen-2017-017833. BMJ Open. 2018. PMID: 29374661 Free PMC article.
-
Application of machine learning approaches in predicting clinical outcomes in older adults - a systematic review and meta-analysis.BMC Geriatr. 2023 Sep 14;23(1):561. doi: 10.1186/s12877-023-04246-w. BMC Geriatr. 2023. PMID: 37710210 Free PMC article.
Cited by
-
Relaxation Degree Analysis Using Frontal Electroencephalogram Under Virtual Reality Relaxation Scenes.Front Neurosci. 2021 Sep 24;15:719869. doi: 10.3389/fnins.2021.719869. eCollection 2021. Front Neurosci. 2021. PMID: 34630012 Free PMC article.
-
EPDRNA: A Model for Identifying DNA-RNA Binding Sites in Disease-Related Proteins.Protein J. 2024 Jun;43(3):513-521. doi: 10.1007/s10930-024-10183-3. Epub 2024 Mar 16. Protein J. 2024. PMID: 38491248
-
Forensic identification of sudden cardiac death: a new approach combining metabolomics and machine learning.Anal Bioanal Chem. 2023 May;415(12):2291-2305. doi: 10.1007/s00216-023-04651-5. Epub 2023 Mar 18. Anal Bioanal Chem. 2023. PMID: 36933055
-
Predictive models for chronic kidney disease after radical or partial nephrectomy in renal cell cancer using early postoperative serum creatinine levels.J Transl Med. 2021 Jul 16;19(1):307. doi: 10.1186/s12967-021-02976-2. J Transl Med. 2021. PMID: 34271916 Free PMC article.
-
Development and validation of machine learning models for intraoperative blood transfusion prediction in severe lumbar disc herniation.iScience. 2024 Oct 5;27(11):111106. doi: 10.1016/j.isci.2024.111106. eCollection 2024 Nov 15. iScience. 2024. PMID: 39620134 Free PMC article.
References
-
- Weiss NS. All-cause mortality as an outcome in epidemiologic studies: proceed with caution. Eur J Epidemiol 2014; 29 (3): 147–9. - PubMed
-
- Black WC, Haggstrom DA, Welch HG.. All-cause mortality in randomized trials of cancer screening. J Natl Cancer Inst 2002; 94 (3): 167–73. - PubMed
-
- Sasieni PD, Wald NJ.. Should a reduction in all-cause mortality be the goal when assessing preventive medical therapies? Circulation 2017; 135 (21): 1985–7. - PubMed
-
- Lin JS, Piper MA, Perdue LA, et al.Screening for colorectal cancer: updated evidence report and systematic review for the US preventive services task force. JAMA 2016; 315 (23): 2576–94. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources