Testing for differences in polygenic scores in the presence of confounding
- PMID: 40233174
- PMCID: PMC12135188
- DOI: 10.1093/genetics/iyaf071
Testing for differences in polygenic scores in the presence of confounding
Abstract
Polygenic scores have become an important tool in human genetics, enabling the prediction of individuals' phenotypes from their genotypes. Understanding how the pattern of differences in polygenic score predictions across individuals intersects with variation in ancestry can provide insights into the evolutionary forces acting on the trait in question and is important for understanding health disparities. However, because most polygenic scores are computed using effect estimates from population samples, they are susceptible to confounding by both genetic and environmental effects that are correlated with ancestry. The extent to which this confounding drives patterns in the distribution of polygenic scores depends on the patterns of population structure in both the original estimation panel and in the prediction/test panel. Here, we use theory from population and statistical genetics, together with simulations, to study the procedure of testing for an association between polygenic scores and axes of ancestry variation in the presence of confounding. We use a general model of genetic relatedness to describe how confounding in the estimation panel biases the distribution of polygenic scores in ways that depends on the degree of overlap in population structure between panels. We then show how this confounding can bias tests for associations between polygenic scores and important axes of ancestry variation in the test panel. Specifically, for any given test, there exists a single axis of population structure in the genome-wide association study (GWAS) panel that needs to be controlled for in order to protect the test. In the context of this result, we study the behavior of multiple approaches to control for stratification along this axis, including standard methods such using principal components as fixed covariates in the GWAS, linear mixed models, and a novel approach for directly estimating the axis using the test panel genotypes. Our analyses highlight the role of estimation noise in the models of population structure as a plausible source of residual confounding in polygenic score analyses.
Keywords: confounding; polygenic scores; population structure.
© The Author(s) 2025. Published by Oxford University Press on behalf of The Genetics Society of America. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.
Conflict of interest statement
Conflicts of interest: The author(s) declare no conflicts of interest.
Update of
-
Testing for differences in polygenic scores in the presence of confounding.bioRxiv [Preprint]. 2024 Jun 26:2023.03.12.532301. doi: 10.1101/2023.03.12.532301. bioRxiv. 2024. Update in: Genetics. 2025 Jun 4;230(2):iyaf071. doi: 10.1093/genetics/iyaf071. PMID: 36993707 Free PMC article. Updated. Preprint.
References
-
- Baik J, Ben Arous G, Péché S. 2005. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
