Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis
- PMID: 33517416
- PMCID: PMC8408353
- DOI: 10.1093/aje/kwab010
Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis
Abstract
Although variables are often measured with error, the impact of measurement error on machine-learning predictions is seldom quantified. The purpose of this study was to assess the impact of measurement error on the performance of random-forest models and variable importance. First, we assessed the impact of misclassification (i.e., measurement error of categorical variables) of predictors on random-forest model performance (e.g., accuracy, sensitivity) and variable importance (mean decrease in accuracy) using data from the National Comorbidity Survey Replication (2001-2003). Second, we created simulated data sets in which we knew the true model performance and variable importance measures and could verify that quantitative bias analysis was recovering the truth in misclassified versions of the data sets. Our findings showed that measurement error in the data used to construct random forests can distort model performance and variable importance measures and that bias analysis can recover the correct results. This study highlights the utility of applying quantitative bias analysis in machine learning to quantify the impact of measurement error on study results.
Keywords: machine learning; measurement error; misclassification; noise; quantitative bias analysis; random forests.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Figures
Comment in
-
Invited Commentary: Quantitative Bias Analysis Can See the Forest for the Trees.Am J Epidemiol. 2021 Sep 1;190(9):1841-1843. doi: 10.1093/aje/kwab011. Am J Epidemiol. 2021. PMID: 33517401
-
Jiang et al. Respond to "Quantitative Bias Analysis".Am J Epidemiol. 2021 Sep 1;190(9):1844-1845. doi: 10.1093/aje/kwab009. Am J Epidemiol. 2021. PMID: 34467403 Free PMC article. No abstract available.
References
-
- Mitchell TM. Machine Learning. 1st ed. New York, NY: McGraw-Hill; 1997.
-
- Simon GE. Big data from health records in mental health care: hardly clairvoyant but already useful. JAMA Psychiatry. 2019;76(4):349–350. - PubMed
-
- Whittle R, Peat G, Belcher J, et al. Measurement error and timing of predictor values for multivariable risk prediction models are poorly reported. J Clin Epidemiol. 2018;102:38–49. - PubMed
-
- Lash TL, Fox MP, Fink AK. Applying Quantitative Bias Analysis to Epidemiologic Data. New York, NY: Springer-Verlag; 2009.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
