Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
- PMID: 34493766
- PMCID: PMC8423758
- DOI: 10.1038/s41598-021-97129-2
Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
Abstract
Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.
© 2021. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures






Similar articles
-
A compilation of tri-allelic SNPs from 1000 Genomes and use of the most polymorphic loci for a large-scale human identification panel.Forensic Sci Int Genet. 2020 May;46:102232. doi: 10.1016/j.fsigen.2020.102232. Epub 2020 Jan 17. Forensic Sci Int Genet. 2020. PMID: 31986343
-
Tracing sub-structure in the European American population with PCA-informative markers.PLoS Genet. 2008 Jul 4;4(7):e1000114. doi: 10.1371/journal.pgen.1000114. PLoS Genet. 2008. PMID: 18797516 Free PMC article.
-
MI-MAAP: marker informativeness for multi-ancestry admixed populations.BMC Bioinformatics. 2020 Apr 3;21(1):131. doi: 10.1186/s12859-020-3462-5. BMC Bioinformatics. 2020. PMID: 32245404 Free PMC article.
-
FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data.BMC Bioinformatics. 2016 Mar 9;17:122. doi: 10.1186/s12859-016-0965-1. BMC Bioinformatics. 2016. PMID: 26961892 Free PMC article.
-
Ancestral informative marker selection and population structure visualization using sparse Laplacian eigenfunctions.PLoS One. 2010 Nov 4;5(11):e13734. doi: 10.1371/journal.pone.0013734. PLoS One. 2010. PMID: 21079796 Free PMC article.
Cited by
-
Federated generalized linear mixed models for collaborative genome-wide association studies.iScience. 2023 Jun 28;26(8):107227. doi: 10.1016/j.isci.2023.107227. eCollection 2023 Aug 18. iScience. 2023. PMID: 37529100 Free PMC article.
References
-
- Cavalli-Sforza LL, et al. The History and Geography of Human Genes. Princeton University Press; 1994.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources