. 2021 Oct 4;21(1):1787.

doi: 10.1186/s12889-021-11829-y.

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

Susan Idicula-Thomas^{1

2}, Ulka Gawde³, Prabhat Jha⁴

Affiliations

¹ Biomedical Informatics Centre, Indian Council of Medical Research-National Institute for Research in Reproductive Health, Mumbai, 400012, India. thomass@nirrh.res.in.
² Centre for Global Health Research, St. Michael's Hospital, Unity Health Toronto, and Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada. thomass@nirrh.res.in.
³ Biomedical Informatics Centre, Indian Council of Medical Research-National Institute for Research in Reproductive Health, Mumbai, 400012, India.
⁴ Centre for Global Health Research, St. Michael's Hospital, Unity Health Toronto, and Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada. prabhat.jha@utoronto.ca.

PMID: 34607591
PMCID: PMC8488544
DOI: 10.1186/s12889-021-11829-y

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

Susan Idicula-Thomas et al. BMC Public Health. 2021.

. 2021 Oct 4;21(1):1787.

doi: 10.1186/s12889-021-11829-y.

Authors

Susan Idicula-Thomas^{1

2}, Ulka Gawde³, Prabhat Jha⁴

Affiliations

¹ Biomedical Informatics Centre, Indian Council of Medical Research-National Institute for Research in Reproductive Health, Mumbai, 400012, India. thomass@nirrh.res.in.
² Centre for Global Health Research, St. Michael's Hospital, Unity Health Toronto, and Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada. thomass@nirrh.res.in.
³ Biomedical Informatics Centre, Indian Council of Medical Research-National Institute for Research in Reproductive Health, Mumbai, 400012, India.
⁴ Centre for Global Health Research, St. Michael's Hospital, Unity Health Toronto, and Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada. prabhat.jha@utoronto.ca.

PMID: 34607591
PMCID: PMC8488544
DOI: 10.1186/s12889-021-11829-y

Abstract

Background: Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS).

Methods: From MDS, 18826 unique childhood deaths at ages 1-59 months during the time period 2004-13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models.

Results: SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis.

Conclusions: Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.

Keywords: Cause of death; Child mortality; Infectious disease; Machine learning; Million Death Study; Prediction model; Verbal autopsy.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Bubble plot depicting distribution of symptoms across six infectious diseases. X-axis represents *disease* class and y-axis represents *symptoms* coded by rule-based method. The bubble size is proportional to percentage of records positive for the symptom in the disease class. The plot was generated using *ggplot2* R package [40]

**Fig. 2**
Tree-based clustering of symptoms for six clusters. The vertical axis represents distance between clusters

**Fig. 3**
Disease-symptom network of top 10 features obtained from SVM model. Green nodes represent symptoms and blue nodes represent diseases. The size of disease node is proportional to number of records corresponding to the disease in the dataset. Edge represents association between disease and symptom and its width is proportional to percentage of records positive for the symptom. The network was created using *igraph* R package [48]

See this image and copyright information in PMC

Cited by

Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa.
Mapundu MT, Kabudula CW, Musenge E, Olago V, Celik T. Mapundu MT, et al. Front Public Health. 2022 Sep 27;10:990838. doi: 10.3389/fpubh.2022.990838. eCollection 2022. Front Public Health. 2022. PMID: 36238252 Free PMC article.

References

1. Soleman N, Chandramohan D, Shibuya K. Verbal autopsy: current practices and challenges. 2006. - PMC - PubMed
1. Hsiao M, Morris SK, Bassani DG, Montgomery AL, Thakur JS, Jha P. Factors associated with physician agreement on verbal autopsy of over 11500 injury deaths in India. PLoS One. 2012;7(1):e30336. doi: 10.1371/journal.pone.0030336. - DOI - PMC - PubMed
1. Byass P, Hussain-Alkhateeb L, D’Ambruoso L, Clark S, Davies J, Fottrell E, et al. An integrated approach to processing WHO-2016 verbal autopsy data: The InterVA-5 model. BMC Med. 2019;17. 10.1186/s12916-019-1333-6. - PMC - PubMed
1. Nichols EK, Byass P, Chandramohan D, Clark SJ, Flaxman AD, Jakob R, et al. The WHO 2016 verbal autopsy instrument: An international standard suitable for automated analysis by InterVA, InSilicoVA, and Tariff 2.0. PLoS Med. 2018;15. 10.1371/journal.pmed.1002486. - PMC - PubMed
1. McCormick TH, Li ZR, Calvert C, Crampin AC, Kahn K, Clark SJ. Probabilistic cause-of-death assignment using verbal autopsies. J Am Stat Assoc. 2016;111(515):1036–1049. doi: 10.1080/01621459.2016.1152191. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

Affiliations

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous