Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes

Peter C Austin¹, Jack V Tu, Jennifer E Ho, Daniel Levy, Douglas S Lee

Affiliations

PMID: 23384592
PMCID: PMC4322906
DOI: 10.1016/j.jclinepi.2012.11.008

Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes

Peter C Austin et al. J Clin Epidemiol. 2013 Apr.

. 2013 Apr;66(4):398-407.

doi: 10.1016/j.jclinepi.2012.11.008. Epub 2013 Feb 4.

Authors

Peter C Austin¹, Jack V Tu, Jennifer E Ho, Daniel Levy, Douglas S Lee

Affiliation

¹ Institute for Clinical Evaluative Sciences, G105, 2075 Bayview Ave, Toronto, Ontario, Canada. peter.austin@ices.on.ca

PMID: 23384592
PMCID: PMC4322906
DOI: 10.1016/j.jclinepi.2012.11.008

Abstract

Objective: Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine-learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines.

Study design and setting: We compared the performance of these classification methods with that of conventional classification trees to classify patients with heart failure (HF) according to the following subtypes: HF with preserved ejection fraction (HFPEF) and HF with reduced ejection fraction. We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression.

Results: We found that modern, flexible tree-based methods from the data-mining literature offer substantial improvement in prediction and classification of HF subtype compared with conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared with the methods proposed in the data-mining literature.

Conclusion: The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying HF subtypes in a population-based sample of patients from Ontario, Canada. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interests: The authors declare that there is no conflict of interest.

Figures

**Figure 1. Calibration of prediction methods in EFFECT Follow-up sample**

See this image and copyright information in PMC

References

1. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Chapman & Hall/CRC; Boca Raton: 1998.
1. Austin PC. A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Statistics in Medicine. 2007;26(15):2937–2957. - PubMed
1. Clark LA, Pregibon D. Tree-Based Methods. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. Chapman & Hall; New York, NY: 1993. pp. 377–419.
1. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning Data Mining, Inference, and Prediction. Springer-Verlag; New York, NY: 2001.
1. Hunt SA, Abraham WT, Chin MH, Feldman AM, Francis GS, Ganiats TG, Jessup M, Konstam MA, Mancini DM, Michl K, Oates JA, Rahko PS, Silver MA, Stevenson LW, Yancy CW. 2009 focused update incorporated into the ACC/AHA 2005 Guidelines for the Diagnosis and Management of Heart Failure in Adults: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guidelines: developed in collaboration with the International Society for Heart and Lung Transplantation. Circulation. 2009;14(119):e391–e479. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes

Affiliation

Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials

Miscellaneous