Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning

Tal Yarkoni¹, Jacob Westfall¹

Affiliations

PMID: 28841086
PMCID: PMC6603289
DOI: 10.1177/1745691617693393

Review

Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning

Tal Yarkoni et al. Perspect Psychol Sci. 2017 Nov.

. 2017 Nov;12(6):1100-1122.

doi: 10.1177/1745691617693393. Epub 2017 Aug 25.

Authors

Tal Yarkoni¹, Jacob Westfall¹

Affiliation

¹ University of Texas at Austin.

PMID: 28841086
PMCID: PMC6603289
DOI: 10.1177/1745691617693393

Abstract

Psychology has historically been concerned, first and foremost, with explaining the causal mechanisms that give rise to behavior. Randomized, tightly controlled experiments are enshrined as the gold standard of psychological research, and there are endless investigations of the various mediating and moderating variables that govern various behaviors. We argue that psychology's near-total focus on explaining the causes of behavior has led much of the field to be populated by research programs that provide intricate theories of psychological mechanism but that have little (or unknown) ability to predict future behaviors with any appreciable accuracy. We propose that principles and techniques from the field of machine learning can help psychology become a more predictive science. We review some of the fundamental concepts and tools of machine learning and point out examples where these concepts have been used to conduct interesting and important psychological research that focuses on predictive research questions. We suggest that an increased focus on prediction, rather than explanation, can ultimately lead us to greater understanding of behavior.

Keywords: explanation; machine learning; prediction.

PubMed Disclaimer

Figures

**Figure 1.**
Training and test error produced by fitting either a linear regression (left) or a 10th-order polynomial regression (right) when the true relationship in the population (red line) is linear. In both cases, the test data (green) deviate more from the model’s predictions (blue line) than the training data (blue). However, the flexibility of the 10th-order polynomial model facilitates much greater overfitting, resulting in lower training error, but much higher test error, than the linear model. MSE = mean squared error.

**Figure 2.**
An estimator’s predictions can deviate from the desired outcome (or true scores) in two ways. First, the predictions may display a systematic tendency (or *bias*) to deviate from the central tendency of the true scores (compare right panels with left panels). Second, the predictions may show a high degree of *variance*, or imprecision (compare bottom panels with top panels).

**Figure 3.**
Schematic illustration of the bias-variance decomposition. Left: under the classical error model, prediction error is defined as the sum of squared differences between true scores and observed scores (black lines). Right: the bias-variance decomposition partitions the total sum of squared errors into two separate components: a bias term that captures a model’s systematic tendency to deviate from the true scores in a predictable way (black line), and a variance term that represents the deviations of the individual observations from the model’s expected prediction (gray lines).

**Figure 4.**
Large samples guards against overfitting. See text for explanation.

**Figure 5.**
Regularization via the lasso. Training/test performance of OLS and lasso regression in two sample datasets that illustrate some of the conditions under which the lasso will tend to outperform OLS. (A) In the “dense” dataset with a low n to p ratio, the sample size is small (n = 100), and there are many predictors (p = 50) that each makes a small individual contribution to the outcome. (B) In the “sparse” dataset with a high n to p ratio, the sample is large (n = 1000), the number of predictors is small (p = 20), and only a few (5) variables make non-zero (and large) contributions. The top panels display the coefficient paths for the lasso as the penalty parameter (x-axis) increases (separately for each simulated dataset). Observe how predictors gradually drop out of the model (i.e., their coefficients are eventually reduced to 0) as the penalty rises and the lasso model increasingly values the sparsity of the solution over the minimization of prediction error. The bottom panels display the total prediction error (measured with mean squared error) in the training (dashed lines) and test (solid lines) samples for both OLS (yellow) and lasso (blue) regression. Observe that, in the small, dense dataset, where the number of predictors is high relative to the sample size, OLS grossly overfits the data (the gap between the solid and dashed yellow lines is very large), and is outperformed by the lasso in the test data for a wide range of penalty settings (the solid blue line is below the solid yellow line for the entire x- axis range). By contrast, when the sample size is large relative to the number of predictors, the performance gap is typically small, and lasso only outperforms OLS for narrowly-tuned ranges of the penalty parameter, if at all.

See this image and copyright information in PMC

References

1. Apté C, & Weiss S (1997). Data mining with decision trees and decision rules. Future Generation Computer Systems, 13(2–3), 197–210. 10.1016/S0167-739X(97)00021-6 - DOI
1. Baayen RH, Milin P, Đurđević DF, Hendrix P, & Marelli M (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3), 438–481. 10.1037/a0023851 - DOI - PubMed
1. Back MD, Stopfer JM, Vazire S, Gaddis S, Schmukle SC, Egloff B, & Gosling SD (2010). Facebook Profiles Reflect Actual Personality, Not Self-Idealization. Psychological Science. 10.1177/0956797609360756 - DOI - PubMed
1. Bakker M, Dijk A. van , & Wicherts JM (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7(6), 543–554. 10.1177/1745691612459060 - DOI - PubMed
1. Balota DA, Yap MJ, Hutchison KA, Cortese MJ, Kessler B, Loftis B, ... Treiman R (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459. 10.3758/BF03193014 - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

R01 MH096906/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning

Affiliation

Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources