Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements?
- PMID: 25536929
- PMCID: PMC4366301
- DOI: 10.1002/gepi.21879
Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements?
Abstract
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Keywords: association testing; confounding; environmental risk; population stratification; probabilistic principal component analysis.
© 2015 Wiley Periodicals, Inc.
Figures
Similar articles
-
Population stratification correction using Bayesian shrinkage priors for genetic association studies.Ann Hum Genet. 2023 Nov;87(6):302-315. doi: 10.1111/ahg.12527. Epub 2023 Sep 28. Ann Hum Genet. 2023. PMID: 37771252 Free PMC article.
-
Limitations of principal components in quantitative genetic association models for human studies.Elife. 2023 May 4;12:e79238. doi: 10.7554/eLife.79238. Elife. 2023. PMID: 37140344 Free PMC article.
-
Transformation of Summary Statistics from Linear Mixed Model Association on All-or-None Traits to Odds Ratio.Genetics. 2018 Apr;208(4):1397-1408. doi: 10.1534/genetics.117.300360. Epub 2018 Feb 2. Genetics. 2018. PMID: 29429966 Free PMC article.
-
Single Marker Family-Based Association Analysis Not Conditional on Parental Information.Methods Mol Biol. 2017;1666:409-439. doi: 10.1007/978-1-4939-7274-6_20. Methods Mol Biol. 2017. PMID: 28980257 Review.
-
Population structure in genetic studies: Confounding factors and mixed models.PLoS Genet. 2018 Dec 27;14(12):e1007309. doi: 10.1371/journal.pgen.1007309. eCollection 2018 Dec. PLoS Genet. 2018. PMID: 30589851 Free PMC article. Review.
Cited by
-
MATS: a novel multi-ancestry transcriptome-wide association study to account for heterogeneity in the effects of cis-regulated gene expression on complex traits.Hum Mol Genet. 2023 Apr 6;32(8):1237-1251. doi: 10.1093/hmg/ddac247. Hum Mol Genet. 2023. PMID: 36179104 Free PMC article.
-
Testing for differences in polygenic scores in the presence of confounding.Genetics. 2025 Jun 4;230(2):iyaf071. doi: 10.1093/genetics/iyaf071. Genetics. 2025. PMID: 40233174
-
Quality control, imputation and analysis of genome-wide genotyping data from the Illumina HumanCoreExome microarray.Brief Funct Genomics. 2016 Jul;15(4):298-304. doi: 10.1093/bfgp/elv037. Epub 2015 Oct 5. Brief Funct Genomics. 2016. PMID: 26443613 Free PMC article.
-
Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations.Cell. 2019 Oct 17;179(3):589-603. doi: 10.1016/j.cell.2019.08.051. Epub 2019 Oct 10. Cell. 2019. PMID: 31607513 Free PMC article. Review.
-
Including diverse and admixed populations in genetic epidemiology research.Genet Epidemiol. 2022 Oct;46(7):347-371. doi: 10.1002/gepi.22492. Epub 2022 Jul 16. Genet Epidemiol. 2022. PMID: 35842778 Free PMC article.
References
-
- Devlin B, Roeder K. Genomic control for association studies. Biometrics. 2004;55:997–1004. - PubMed
Publication types
MeSH terms
Grants and funding
- U01 DK085545/DK/NIDDK NIH HHS/United States
- R01 GM081535/GM/NIGMS NIH HHS/United States
- U01 DK085501/DK/NIDDK NIH HHS/United States
- R01 DK053889/DK/NIDDK NIH HHS/United States
- R01 GM113250/GM/NIGMS NIH HHS/United States
- R01GM081535/GM/NIGMS NIH HHS/United States
- R01 HL105397/HL/NHLBI NIH HHS/United States
- R01 HL116720/HL/NHLBI NIH HHS/United States
- R01 GM031575/GM/NIGMS NIH HHS/United States
- R01HL105397/HL/NHLBI NIH HHS/United States
- U01 DK085584/DK/NIDDK NIH HHS/United States
- U01 DK085524/DK/NIDDK NIH HHS/United States
- R01 DK047482/DK/NIDDK NIH HHS/United States
- U01 DK085526/DK/NIDDK NIH HHS/United States
- R01GM113250/GM/NIGMS NIH HHS/United States
- R01HL116720/HL/NHLBI NIH HHS/United States
- P01 HL045222/HL/NHLBI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Research Materials