. 2016 Mar 31;11(3):e0152719.

doi: 10.1371/journal.pone.0152719. eCollection 2016.

Statistically Controlling for Confounding Constructs Is Harder than You Think

Jacob Westfall¹, Tal Yarkoni¹

Affiliations

PMID: 27031707
PMCID: PMC4816570
DOI: 10.1371/journal.pone.0152719

Statistically Controlling for Confounding Constructs Is Harder than You Think

Jacob Westfall et al. PLoS One. 2016.

. 2016 Mar 31;11(3):e0152719.

doi: 10.1371/journal.pone.0152719. eCollection 2016.

Authors

Jacob Westfall¹, Tal Yarkoni¹

Affiliation

¹ University of Texas at Austin, Austin, TX, United States of America.

PMID: 27031707
PMCID: PMC4816570
DOI: 10.1371/journal.pone.0152719

Abstract

Social scientists often seek to demonstrate that a construct has incremental validity over and above other related constructs. However, these claims are typically supported by measurement-level models that fail to consider the effects of measurement (un)reliability. We use intuitive examples, Monte Carlo simulations, and a novel analytical framework to demonstrate that common strategies for establishing incremental construct validity using multiple regression analysis exhibit extremely high Type I error rates under parameter regimes common in many psychological domains. Counterintuitively, we find that error rates are highest--in some cases approaching 100%--when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious. We present a web application (http://jakewestfall.org/ivy/) that readers can use to explore the statistical properties of these and other incremental validity arguments. We conclude by reviewing SEM-based statistical approaches that appropriately control the Type I error rate when attempting to establish incremental validity.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Fig 1. Plot of subjective heat ratings on a 7-point Likert scale against the “true” underlying daily temperatures.**

**Fig 2. Illustration of residual confounding.**
(A) Simple relationship between daily swimming pool deaths and number of ice cream cones sold. (B) Relationship between daily swimming pool deaths and number of ice cream cones sold after controlling for subjective heat Likert ratings. (C) Relationship between daily swimming pool deaths and number of ice cream cones sold after controlling for recorded daily temperatures.

**Fig 3. Contour plots of Type 1 error probabilities for the argument for predictive utility.**
The null hypothesis is that T₁ has no partial relationship with Y after controlling for T₂ (i.e., ρ_1.2 = 0). The size of the true indirect effect of T₁ on Y via T₂ varies from small (panel A) to medium (panel B) to large (panel C).

**Fig 4. Contour plots of Type 1 error probabilities for the argument for separable constructs.**
The alternative hypothesis is that *both* of the predictors are separately related to the outcome, which implies the null hypothesis that either of the predictors is not related to the outcome. The magnitude of the true correlation between Y and T varies from small (panel A) to medium (panel B) to large (panel C).

**Fig 5. Contour plots of Type 1 error probabilities for the argument for improved measurement.**
The null hypothesis is that the two predictors have the same partial correlation with the outcome. The magnitude of the true partial correlation varies from small (panel A) to medium (panel B) to large (panel C). Varying δ does not have a very big impact on the error rates, so we fix it at δ = .5 in all three panels.

**Fig 6. Test statistics from models regressing BRI outcomes on both the NEO and HEXACO versions of a factor.**
The test statistics are t-statistics for the regression models and z-statistics for the SEM models. BRI = Behavioral Report Inventory. SEM = Structural Equation Model.

**Fig 7. Test statistics from models regressing BRI outcomes on both the NEO and HEXACO versions of a factor.**
The test statistics are t-statistics for the regression models and z-statistics for the SEM models. BRI = Behavioral Report Inventory. SEM = Structural Equation Model.

**Fig 8. Test statistics from models predicting BRI outcomes.**
The test statistics are t-statistics for the regression models and z-statistics for the SEM models. BRI = Behavioral Report Inventory. SEM = Structural Equation Model.

**Fig 9. Path diagram for a SEM predicting drug use, allowing for specified degrees of reliability in the observed NEO and HEXACO scores.**
Circle nodes represent latent variables, square nodes represent observed variables, solid lines represent paths or variances to be estimated from the data, and dashed lines represent paths or variances that are fixed to constant, a priori values. SEM = Structural Equation Model.

**Fig 10. Test statistics as a function of assumed reliability.**
The shaded region gives the range within which the test statistics are nonsignificant. In each model, assuming reliabilities below a certain value invariably caused the model to fail to converge or to yield an inadmissible solution (i.e., impossible correlation matrices for the latent variables); we only plot the results for reliability values that successfully converge on stable estimates.

**Fig 11. Incremental validity in multiple regression vs. SEM.**
The SEM results are from a simulation using 300,000 iterations. The multiple regression results are computed analytically. The SEM line in the left panel is a smoothed curve derived from fitting a generalized additive model with a binomial response to the simulation results tracking whether the null hypothesis was rejected. In the right panel, the SEM line and shaded region are based on first applying rolling medians of width 101 to the simulated regression coefficients and standard errors (to reduce the distorting influence of extreme outlying parameter estimates occurring particularly at low reliability values), and then fitting a generalized additive model to these rolling medians. SEM = Structural Equation Model.

**Fig 12. Power to detect incremental validity using SEM.**
The lines in each panel are smoothed curves derived from fitting generalized additive models with a binomial response to the simulation results. SEM = Structural Equation Model.

See this image and copyright information in PMC

Cited by

Genetic and Environmental Factors of Non-Ability-Based Confidence.
Vogt RL, Zheng A, Briley DA, Malanchini M, Harden KP, Tucker-Drob EM. Vogt RL, et al. Soc Psychol Personal Sci. 2022 Apr;13(3):734-746. doi: 10.1177/19485506211036610. Epub 2021 Sep 6. Soc Psychol Personal Sci. 2022. PMID: 39006758 Free PMC article.
Predictors of well-being and productivity among software professionals during the COVID-19 pandemic - a longitudinal study.
Russo D, Hanel PHP, Altnickel S, van Berkel N. Russo D, et al. Empir Softw Eng. 2021;26(4):62. doi: 10.1007/s10664-021-09945-9. Epub 2021 Apr 28. Empir Softw Eng. 2021. PMID: 33942010 Free PMC article.
Valuing time over money predicts happiness after a major life transition: A preregistered longitudinal study of graduating students.
Whillans A, Macchia L, Dunn E. Whillans A, et al. Sci Adv. 2019 Sep 18;5(9):eaax2615. doi: 10.1126/sciadv.aax2615. eCollection 2019 Sep. Sci Adv. 2019. PMID: 31555738 Free PMC article.
The Social Brain Automatically Predicts Others' Future Mental States.
Thornton MA, Weaverdyck ME, Tamir DI. Thornton MA, et al. J Neurosci. 2019 Jan 2;39(1):140-148. doi: 10.1523/JNEUROSCI.1431-18.2018. Epub 2018 Nov 2. J Neurosci. 2019. PMID: 30389840 Free PMC article.
The Association between Fast Food Outlets and Overweight in Adolescents Is Confounded by Neighbourhood Deprivation: A Longitudinal Analysis of the Millennium Cohort Study.
Green MA, Hobbs M, Ding D, Widener M, Murray J, Reece L, Singleton A. Green MA, et al. Int J Environ Res Public Health. 2021 Dec 15;18(24):13212. doi: 10.3390/ijerph182413212. Int J Environ Res Public Health. 2021. PMID: 34948820 Free PMC article.

See all "Cited by" articles

References

1. Hunsley J, Meyer GJ. The Incremental Validity of Psychological Testing and Assessment: Conceptual, Methodological, and Statistical Issues. Psychol Assess. 2003;15(4):446–55. - PubMed
1. Sechrest L. Incremental validity: A recommendation. Educ Psychol Meas. 1963;23(1):153–8.
1. Armstrong BG. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med. 1998. October;55(10):651–6. - PMC - PubMed
1. Christenfeld NJS, Sloan RP, Carroll D, Greenland S. Risk factors, confounding, and the illusion of statistical control. Psychosom Med. 2004. December;66(6):868–75. - PubMed
1. Fewell Z, Smith GD, Sterne JAC. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. Am J Epidemiol. 2007. September 15;166(6):646–55. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 MH096906/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Statistically Controlling for Confounding Constructs Is Harder than You Think

Affiliation

Statistically Controlling for Confounding Constructs Is Harder than You Think

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources