Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 1;10(9):2408-2416.
doi: 10.1093/gbe/evy182.

An Ancestry Informative Marker Set Which Recapitulates the Known Fine Structure of Populations in South Asia

Affiliations

An Ancestry Informative Marker Set Which Recapitulates the Known Fine Structure of Populations in South Asia

Ranajit Das et al. Genome Biol Evol. .

Abstract

The inference of genomic ancestry using ancestry informative markers (AIMs) can be useful for a range of studies in evolutionary genetics, biomedical research, and forensic analyses. However, the determination of AIMs for highly admixed populations with complex ancestries has remained a formidable challenge. Given the immense genetic heterogeneity and unique population structure of the Indian subcontinent, here we sought to derive AIMs that would yield a cohesive and faithful understanding of South Asian genetic origins. To discern the most optimal strategy for extracting AIMs for South Asians we compared three commonly used AIMs-determining methods namely, Infocalc, FST, and Smart Principal Component Analysis with ADMIXTURE, using previously published whole genome data from the Indian subcontinent. Our findings suggest that the Infocalc approach is likely most suitable for delineation of South Asian AIMs. In particular, Infocalc-2,000 (N = 2,000) appeared as the most informative South Asian AIMs panel that recapitulated the finer structure within South Asian genomes with high degree of sensitivity and precision, whereas a negative control with an equivalent number of randomly selected markers when used to interrogate the South Asian populations, failed to do so. We discuss the utility of all approaches under evaluation for AIMs derivation and interpreting South Asian genomic ancestries. Notably, this is the first report of an AIMs panel for South Asian ancestry inference. Overall these findings may aid in developing cost-effective resources for large-scale demographic analyses and foster expansion of our knowledge of human origins and disease, in the South Asian context.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
—Admixture analyses of data sets generated using most informative SNPs detected by Infocalc algorithm. Admixture plots depicting the ancestry components of South Asian genomes. (A) Admixture analysis of the CSS (N = 499,158); (B) Admixture analysis of Infocalc-10,000; (C) Admixture analysis of Infocalc-2,534; (D) Admixture analysis of Infocalc-2,000; (E) Admixture analysis of Infocalc-1,500; (F) Admixture analysis of Infocalc-1,000; and (G) Admixture analysis of Infocalc-500. Admixture proportions were generated through an unsupervised admixture analyses at K =10 using ADMIXTURE v1.3 and plotted in R v3.2.3. Each individual is represented by a vertical line partitioned into colored segments whose lengths are proportional to the contributions of the ancestral components to the genome of the individual. Note that Nyshas are included among the ATB group.
<sc>Fig</sc>. 2.
Fig. 2.
—Box and whisker plots comparing the Euclidean distances between the admixture proportions of the South Asian genomes obtained using the CSS and candidate panels deduced using alternative AIMs determining approaches. The number of SNPs contained in each of the candidate panels illustrated has been indicated in the text. Note: Random-2,534 comprised of 2,534 randomly selected SNPs from the CSS and the Consensus-2,534 comprised of 2,534 SNPs that were detected by at least two out of the four AIMs-determining approaches under evaluation.
<sc>Fig</sc>. 3.
Fig. 3.
—PCA of South Asian genomes. PCA plots showing genetic differentiation among South Asian genomes. The candidate panels were generated using highly informative SNPs detected through the Infocalc algorithm. (A) PCA of the CSS (N = 499,158), where the X-axis (PC1) explained 39.7% variance, whereas the Y-axis (PC2) explained 24.2% variance of the data. (B) PCA of Infocalc-10,000, where the X-axis (PC1) explained 39.8% variance, whereas the Y-axis (PC2) explained 23.9% variance of the data. (C) PCA of Infocalc-2,534, where the X-axis (PC1) explained 39.8% variance, whereas the Y-axis (PC2) explained 23.8% variance of the data. (D) PCA of Infocalc-2,000, where the X-axis (PC1) explained 39.3% variance, whereas the Y-axis (PC2) explained 24.2% variance of the data. (E) PCA of Infocalc-1,500, where the X-axis (PC1) explained 39.6% variance, whereas the Y-axis (PC2) explained 24.3% variance of the data. (F) PCA of Infocalc-1,000, where the X-axis (PC1) explained 38.3% variance, whereas the Y-axis (PC2) explained 23.2% variance of the data. (G) PCA of Infocalc-500, where the X-axis (PC1) explained 36.7% variance, whereas the Y-axis (PC2) explained 23.1% variance of the data. Notable populations are marked with circles. In all four cases illustrated here, PCA was performed in PLINK v1.9 and the top four principal components (PCs) were extracted. Top two PCs (PC1 and PC2), explaining the highest variance of the data were plotted in R v3.2.3. **X-axis designates PC1 and Y-axis designates PC2.

Similar articles

Cited by

References

    1. Alexander DH, Novembre J, Lange K.. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 199:1655–1664. - PMC - PubMed
    1. Bamshad M, et al. 2001. Genetic evidence on the origins of Indian caste populations. Genome Res. 116:994–1004. - PMC - PubMed
    1. Barbosa FB, et al. 2017. Ancestry informative marker panel to estimate population stratification using genome-wide human array. Ann Hum Genet. 816:225–233. - PubMed
    1. Basu A. 2003. Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res. 1310:2277–2290. - PMC - PubMed
    1. Basu A, Sarkar-Roy N, Majumder PP.. 2016. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc Natl Acad Sci U S A. 1136:1594–1599. - PMC - PubMed