A genealogical interpretation of principal components analysis
- PMID: 19834557
- PMCID: PMC2757795
- DOI: 10.1371/journal.pgen.1000686
A genealogical interpretation of principal components analysis
Abstract
Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's f(st) and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference.
Conflict of interest statement
The author has declared that no competing interests exist.
Figures































Similar articles
-
How do SNP ascertainment schemes and population demographics affect inferences about population history?BMC Genomics. 2015 Apr 3;16(1):266. doi: 10.1186/s12864-015-1469-5. BMC Genomics. 2015. PMID: 25887858 Free PMC article.
-
A spectral theory for Wright's inbreeding coefficients and related quantities.PLoS Genet. 2021 Jul 19;17(7):e1009665. doi: 10.1371/journal.pgen.1009665. eCollection 2021 Jul. PLoS Genet. 2021. PMID: 34280184 Free PMC article.
-
Coalescents and genealogical structure under neutrality.Annu Rev Genet. 1995;29:401-21. doi: 10.1146/annurev.ge.29.120195.002153. Annu Rev Genet. 1995. PMID: 8825481 Review.
-
Assessing the power of principal components and wright's fixation index analyzes applied to reveal the genome-wide genetic differences between herds of Holstein cows.BMC Genet. 2020 Apr 28;21(1):47. doi: 10.1186/s12863-020-00848-0. BMC Genet. 2020. PMID: 32345235 Free PMC article.
-
[Genetic aspects of genealogy].Genetika. 2011 Nov;47(11):1451-72. Genetika. 2011. PMID: 22332404 Review. Russian.
Cited by
-
SuSiE PCA: A scalable Bayesian variable selection technique for principal component analysis.iScience. 2023 Oct 13;26(11):108181. doi: 10.1016/j.isci.2023.108181. eCollection 2023 Nov 17. iScience. 2023. PMID: 37953948 Free PMC article.
-
A linkage disequilibrium-based statistical test for Genome-Wide Epistatic Selection Scans in structured populations.Heredity (Edinb). 2021 Jan;126(1):77-91. doi: 10.1038/s41437-020-0349-1. Epub 2020 Jul 30. Heredity (Edinb). 2021. PMID: 32728044 Free PMC article.
-
Population structure of Han Chinese in the modern Taiwanese population based on 10,000 participants in the Taiwan Biobank project.Hum Mol Genet. 2016 Dec 15;25(24):5321-5331. doi: 10.1093/hmg/ddw346. Hum Mol Genet. 2016. PMID: 27798100 Free PMC article.
-
Spatial genetic structure of European wild boar, with inferences on late-Pleistocene and Holocene demographic history.Heredity (Edinb). 2023 Mar;130(3):135-144. doi: 10.1038/s41437-022-00587-1. Epub 2023 Jan 13. Heredity (Edinb). 2023. PMID: 36639700 Free PMC article.
-
Effects of sample selection bias on the accuracy of population structure and ancestry inference.G3 (Bethesda). 2014 Mar 17;4(5):901-11. doi: 10.1534/g3.113.007633. G3 (Bethesda). 2014. PMID: 24637351 Free PMC article.
References
-
- Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. - DOI - PMC - PubMed
-
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. - PubMed
-
- Cavalli-Sforza LL, Menozzi P, Piazza A. The History and Geography of Human Genes. New Jersey: Princeton; 1994.
-
- Reich D, Price AL, Patterson N. Principal component analysis of genetic data. Nat Genet. 2008;40:491–492. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous