Robust methods for population stratification in genome wide association studies
- PMID: 23601181
- PMCID: PMC3637636
- DOI: 10.1186/1471-2105-14-132
Robust methods for population stratification in genome wide association studies
Abstract
Background: Genome-wide association studies can provide novel insights into diseases of interest, as well as to the responsiveness of an individual to specific treatments. In such studies, it is very important to correct for population stratification, which refers to allele frequency differences between cases and controls due to systematic ancestry differences. Population stratification can cause spurious associations if not adjusted properly. The principal component analysis (PCA) method has been relied upon as a highly useful methodology to adjust for population stratification in these types of large-scale studies. Recently, the linear mixed model (LMM) has also been proposed to account for family structure or cryptic relatedness. However, neither of these approaches may be optimal in properly correcting for sample structures in the presence of subject outliers.
Results: We propose to use robust PCA combined with k-medoids clustering to deal with population stratification. This approach can adjust for population stratification for both continuous and discrete populations with subject outliers, and it can be considered as an extension of the PCA method and the multidimensional scaling (MDS) method. Through simulation studies, we compare the performance of our proposed methods with several widely used stratification methods, including PCA and MDS. We show that subject outliers can greatly influence the analysis results from several existing methods, while our proposed robust population stratification methods perform very well for both discrete and admixed populations with subject outliers. We illustrate the new method using data from a rheumatoid arthritis study.
Conclusions: We demonstrate that subject outliers can greatly influence the analysis result in GWA studies, and propose robust methods for dealing with population stratification that outperform existing population stratification methods in the presence of subject outliers.
Figures



Similar articles
-
Clustering by genetic ancestry using genome-wide SNP data.BMC Genet. 2010 Dec 9;11:108. doi: 10.1186/1471-2156-11-108. BMC Genet. 2010. PMID: 21143920 Free PMC article.
-
Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.Genet Epidemiol. 2015 May;39(4):276-93. doi: 10.1002/gepi.21896. Epub 2015 Mar 23. Genet Epidemiol. 2015. PMID: 25810074 Free PMC article.
-
Evaluation of methods for adjusting population stratification in genome-wide association studies: Standard versus categorical principal component analysis.Ann Hum Genet. 2019 Nov;83(6):454-464. doi: 10.1111/ahg.12339. Epub 2019 Jul 19. Ann Hum Genet. 2019. PMID: 31322288
-
New approaches to population stratification in genome-wide association studies.Nat Rev Genet. 2010 Jul;11(7):459-63. doi: 10.1038/nrg2813. Nat Rev Genet. 2010. PMID: 20548291 Free PMC article. Review.
-
Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses.Heredity (Edinb). 2011 Apr;106(4):511-9. doi: 10.1038/hdy.2010.91. Epub 2010 Jul 14. Heredity (Edinb). 2011. PMID: 20628415 Free PMC article. Review.
Cited by
-
Correcting for Population Stratification Reduces False Positive and False Negative Results in Joint Analyses of Host and Pathogen Genomes.Front Genet. 2018 Jul 30;9:266. doi: 10.3389/fgene.2018.00266. eCollection 2018. Front Genet. 2018. PMID: 30105048 Free PMC article.
-
Commentary: Portuguese crypto-Jews: the genetic heritage of a complex history.Front Genet. 2015 Aug 7;6:261. doi: 10.3389/fgene.2015.00261. eCollection 2015. Front Genet. 2015. PMID: 26300912 Free PMC article. No abstract available.
-
Identification of novel putative alleles related to important agronomic traits of wheat using robust strategies in GWAS.Sci Rep. 2023 Jun 19;13(1):9927. doi: 10.1038/s41598-023-36134-z. Sci Rep. 2023. PMID: 37336905 Free PMC article.
-
IPCAPS: an R package for iterative pruning to capture population structure.Source Code Biol Med. 2019 Mar 20;14:2. doi: 10.1186/s13029-019-0072-6. eCollection 2019. Source Code Biol Med. 2019. PMID: 30936940 Free PMC article.
-
Robustification of GWAS to explore effective SNPs addressing the challenges of hidden population stratification and polygenic effects.Sci Rep. 2021 Jun 22;11(1):13060. doi: 10.1038/s41598-021-90774-7. Sci Rep. 2021. PMID: 34158546 Free PMC article.
References
-
- Carvalho B, Bengtsson H, Speed TP, Irizarry RA. Exploration, normalization, and genotype calls of high density oligonucleotide SNP array data. Biostatistics. 2007;8:485–499. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous