Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards

Abraham D Flaxman¹, Alireza Vahdatpour, Sean Green, Spencer L James, Christopher Jl Murray; Population Health Metrics Research Consortium (PHMRC)

Affiliations

PMID: 21816105
PMCID: PMC3160922
DOI: 10.1186/1478-7954-9-29

Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards

Abraham D Flaxman et al. Popul Health Metr. 2011.

. 2011 Aug 4:9:29.

doi: 10.1186/1478-7954-9-29.

Authors

Abraham D Flaxman¹, Alireza Vahdatpour, Sean Green, Spencer L James, Christopher Jl Murray; Population Health Metrics Research Consortium (PHMRC)

Affiliation

¹ Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave,, Suite 600, Seattle, WA 98121, USA. abie@uw.edu.

PMID: 21816105
PMCID: PMC3160922
DOI: 10.1186/1478-7954-9-29

Abstract

Background: Computer-coded verbal autopsy (CCVA) is a promising alternative to the standard approach of physician-certified verbal autopsy (PCVA), because of its high speed, low cost, and reliability. This study introduces a new CCVA technique and validates its performance using defined clinical diagnostic criteria as a gold standard for a multisite sample of 12,542 verbal autopsies (VAs).

Methods: The Random Forest (RF) Method from machine learning (ML) was adapted to predict cause of death by training random forests to distinguish between each pair of causes, and then combining the results through a novel ranking technique. We assessed quality of the new method at the individual level using chance-corrected concordance and at the population level using cause-specific mortality fraction (CSMF) accuracy as well as linear regression. We also compared the quality of RF to PCVA for all of these metrics. We performed this analysis separately for adult, child, and neonatal VAs. We also assessed the variation in performance with and without household recall of health care experience (HCE).

Results: For all metrics, for all settings, RF was as good as or better than PCVA, with the exception of a nonsignificantly lower CSMF accuracy for neonates with HCE information. With HCE, the chance-corrected concordance of RF was 3.4 percentage points higher for adults, 3.2 percentage points higher for children, and 1.6 percentage points higher for neonates. The CSMF accuracy was 0.097 higher for adults, 0.097 higher for children, and 0.007 lower for neonates. Without HCE, the chance-corrected concordance of RF was 8.1 percentage points higher than PCVA for adults, 10.2 percentage points higher for children, and 5.9 percentage points higher for neonates. The CSMF accuracy was higher for RF by 0.102 for adults, 0.131 for children, and 0.025 for neonates.

Conclusions: We found that our RF Method outperformed the PCVA method in terms of chance-corrected concordance and CSMF accuracy for adult and child VA with and without HCE and for neonatal VA without HCE. It is also preferable to PCVA in terms of time and cost. Therefore, we recommend it as the technique of choice for analyzing past and current verbal autopsies.

PubMed Disclaimer

Figures

**Figure 1**
**Expert algorithm and RF decision trees**. A right branch from a node represents "yes" and a left branch represents "no." a) Decision tree representation of expert algorithm to identify malaria deaths in child VAs (one-versus-all approach); b) Two random decision trees generated by RF to distinguish AIDS deaths from maternal sepsis deaths (one-versus-one approach).

**Figure 2**
**Schematic representation of RF**.

**Figure 3**
**Schematic representation of "ranking" technique for cause prediction from random forest scores**.

**Figure 4**
**Partial-cause assignment increases partial chance-corrected concordance for adult, child, and neonate VAs with and without HCE**. Slope of increase is higher between one and two cause assignments.

**Figure 5**
**Median chance-corrected concordance (%) for RF across 500 splits, by cause, for adult VA, with and without HCE**.

**Figure 6**
**Median chance-corrected concordance (%) for RF across 500 splits, by cause, for child VA, with and without HCE**.

**Figure 7**
**Median chance-corrected concordance (%) for RF across 500 splits, by cause, for neonatal VA, with and without HCE**.

**Figure 8**
**Scatter of median chance-corrected concordance of RF versus PCVA, for adult module**.

**Figure 9**
**Scatter of median chance-corrected concordance of RF versus PCVA, for child module**.

**Figure 10**
**Estimated versus true CSMFs for 500 Dirichlet splits, showing that for selected causes of adult mortality (AIDS, colorectal cancer, maternal, and IHD), the performance of RF varies**. For AIDS and IHD, RF tends to overestimate the cause fraction when the true CSMF is small and underestimate otherwise. For colorectal cancer, RF mostly assigns the same CSMF regardless of true CSMF, and for maternal causes, RF is more accurate.

See this image and copyright information in PMC

References

1. Soleman N, Chandramohan D, Shibuya K. Verbal autopsy: current practices and challenges. Bull World Health Organ. 2006;84:239–245. doi: 10.2471/BLT.05.027003. - DOI - PMC - PubMed
1. Mitchell TM. Machine Learning. 1. New York, NY: McGraw-Hill Science/Engineering/Math; 1997.
1. Boulle A, Chandramohan D, Weller P. A case study of using artificial neural networks for classifying cause of death from verbal autopsy. Int J Epidemiol. 2001;30:515–520. doi: 10.1093/ije/30.3.515. - DOI - PubMed
1. Breiman L. Random Forests. Machine Learning. 2001;45:5–32. doi: 10.1023/A:1010933404324. - DOI
1. Caruana R, Karampatziakis N, Yessenalina A. An empirical evaluation of supervised learning in high dimensions. Proceedings of the 25th International Conference on Machine Learning - ICML '08, Helsinki, Finland. 2008. pp. 96–103.

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards

Affiliation

Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards

Authors

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Miscellaneous