Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comment
. 2015 Mar;16(2):338-45.
doi: 10.1093/bib/bbu012. Epub 2014 Apr 9.

Letter to the Editor: On the term 'interaction' and related phrases in the literature on Random Forests

Comment

Letter to the Editor: On the term 'interaction' and related phrases in the literature on Random Forests

Anne-Laure Boulesteix et al. Brief Bioinform. 2015 Mar.

Abstract

In an interesting and quite exhaustive review on Random Forests (RF) methodology in bioinformatics Touw et al. address--among other topics--the problem of the detection of interactions between variables based on RF methodology. We feel that some important statistical concepts, such as 'interaction', 'conditional dependence' or 'correlation', are sometimes employed inconsistently in the bioinformatics literature in general and in the literature on RF in particular. In this letter to the Editor, we aim to clarify some of the central statistical concepts and point out some confusing interpretations concerning RF given by Touw et al. and other authors.

Keywords: conditional inference trees; conditional variable importance; correlation; interaction; random forest; statistics.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Idealized tree in the presence of two predictor variables, X1 and X2 with main effects only (no interaction). The bars at the bottom of the tree denote the proportion of observations with Y = 0 and Y = 1 in the respective leaves.
Figure 2:
Figure 2:
Idealized tree in the presence of two predictor variables, X1 and X2 with interaction. (A) Different predictor variables are selected on the left and on the right. (B) Splitting stops after the first split on the right but not on the left. (C) The same predictor variable is selected on the left and on the right, but the effect is different. The bars at the bottom of the tree denote the proportion of observations with Y = 0 and Y = 1 in the respective leaves.

Comment on

References

    1. Touw WG, Bayjanov JR, Overmars L, et al. Data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief Bioinform. 2013;14:315–26. - PMC - PubMed
    1. Kim Y, Wojciechowski R, Sung H, et al. Evaluation of random forests performance for genome-wide association studies in the presence of interaction effects. BMC Proceedings. 2009;3:S64. - PMC - PubMed
    1. Kelly C, Okada K. 2012. Variable interaction measures with random forest classifiers. 9th IEEE International Symposium on Biomedical Imaging (ISBI) pp. 154–7.
    1. Miettinen OS. Theoretical Epidemiology: Principles of Occurrence Research in Medicine. New York: Wiley; 1985.
    1. Grobbee DE, Hoes AW. Clinical Epidemiology: Principles, Methods, and Applications for Clinical Research. London: Jones & Bartlett Learning; 2009.