Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India
- PMID: 34607591
- PMCID: PMC8488544
- DOI: 10.1186/s12889-021-11829-y
Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India
Abstract
Background: Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS).
Methods: From MDS, 18826 unique childhood deaths at ages 1-59 months during the time period 2004-13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models.
Results: SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis.
Conclusions: Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.
Keywords: Cause of death; Child mortality; Infectious disease; Machine learning; Million Death Study; Prediction model; Verbal autopsy.
© 2021. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures



Similar articles
-
Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa.Front Public Health. 2022 Sep 27;10:990838. doi: 10.3389/fpubh.2022.990838. eCollection 2022. Front Public Health. 2022. PMID: 36238252 Free PMC article.
-
Performance criteria for verbal autopsy-based systems to estimate national causes of death: development and application to the Indian Million Death Study.BMC Med. 2014 Feb 4;12:21. doi: 10.1186/1741-7015-12-21. BMC Med. 2014. PMID: 24495287 Free PMC article. Clinical Trial.
-
Automatically determining cause of death from verbal autopsy narratives.BMC Med Inform Decis Mak. 2019 Jul 9;19(1):127. doi: 10.1186/s12911-019-0841-9. BMC Med Inform Decis Mak. 2019. PMID: 31288814 Free PMC article.
-
Verbal autopsies for adult deaths: issues in their development and validation.Int J Epidemiol. 1994 Apr;23(2):213-22. doi: 10.1093/ije/23.2.213. Int J Epidemiol. 1994. PMID: 8082945 Review.
-
Correcting for Verbal Autopsy Misclassification Bias in Cause-Specific Mortality Estimates.Am J Trop Med Hyg. 2023 Apr 10;108(5_Suppl):66-77. doi: 10.4269/ajtmh.22-0318. Print 2023 May 2. Am J Trop Med Hyg. 2023. PMID: 37037438 Free PMC article. Review.
Cited by
-
Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa.Front Public Health. 2022 Sep 27;10:990838. doi: 10.3389/fpubh.2022.990838. eCollection 2022. Front Public Health. 2022. PMID: 36238252 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Research Materials
Miscellaneous