Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Aug;84(4):343-64.
doi: 10.3378/027.084.0401.

PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations

Affiliations
Comparative Study

PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations

Abra Brisbin et al. Hum Biol. 2012 Aug.

Abstract

Identifying ancestry along each chromosome in admixed individuals provides a wealth of information for understanding the population genetic history of admixture events and is valuable for admixture mapping and identifying recent targets of selection. We present PCAdmix (available at https://sites.google.com/site/pcadmix/home ), a Principal Components-based algorithm for determining ancestry along each chromosome from a high-density, genome-wide set of phased single-nucleotide polymorphism (SNP) genotypes of admixed individuals. We compare our method to HAPMIX on simulated data from two ancestral populations, and we find high concordance between the methods. Our method also has better accuracy than LAMP when applied to three-population admixture, a situation as yet unaddressed by HAPMIX. Finally, we apply our method to a data set of four Latino populations with European, African, and Native American ancestry. We find evidence of assortative mating in each of the four populations, and we identify regions of shared ancestry that may be recent targets of selection and could serve as candidate regions for admixture-based association mapping.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Outline of the PCAdmix algorithm.
Figure 2
Figure 2
Estimation of average ancestry proportion for a haplotype. For k = 3 ancestral populations, the population A average ancestry proportion of each haplotype (black square) is estimated by that haplotype’s distance from the line connecting the means of the other two populations on the first and second principal components, as a proportion of the haplotype’s total distance from all edges: qi,A = a/(a + b + c).
Figure 3
Figure 3
Comparison of PCAdmix and HAPMIX on simulated chromosomes. (A) and (B) are two examples of simulated chromosomes. Top bar indicates the simulated ancestry of each chromosome (black = YRI, gray = CEU). Solid and dashed lines indicate the posterior probability of YRI ancestry at that window, using our method (solid) and HAPMIX (dashed). The black oval indicates a short region of European ancestry. The black arrows indicate regions where both methods inferred European ancestry, although the segment was simulated from a YRI haplotype.
Figure 4
Figure 4
Effects of LD filtering on a simulated chromosome. (A) 20 SNPs per window. Solid line = data filtered to r2 ≤ 0.80; dashed line = data without filtering. (B) Solid line = 20 SNPs per window, with LD filtering; dashed line = 40 SNPs per window without filtering. Black arrows indicate a region of European ancestry which is correctly assigned when LD filtering is used or when the window size is 40 SNPs.
Figure 5
Figure 5
Diploid accuracy and call rate of PCAdmix and LAMP. Assigned % is out of 43,518 SNPs. Chromosomes were simulated with ancestry from three populations, including Yoruba and French. Labeled populations are the third population included in simulations. Biaka = Biaka Pygmies; Mbuti = Mbuti Pygmies; Italian = North Italian.
Figure 6
Figure 6
Accuracy vs. minimum FST. Shown is the overall accuracy on three-population simulations vs. the minimum pairwise FST among the three populations.
Figure 7
Figure 7
Analysis of Latino individuals using PCAdmix. Chromosome 22 is shown. We used a calling threshold of 0.9. DOM = Dominican; COL = Colombian; PRI = Puerto Rican; ECU = Ecuadorian.
Figure 8
Figure 8
Normalized ancestry proportions in Latino populations. Dashed lines indicate values that are three SDs from the mean. Black arrows indicate regions where three Latino populations share high proportions of African or Native American ancestry. (A) African ancestry proportion on chromosome 6. (B) Native American ancestry on chromosome 2. (C) Native American ancestry on chromosome 8. GW = Genome-wide; SD = standard deviation.

Similar articles

Cited by

References

    1. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. - PMC - PubMed
    1. Altshuler DM, Gibbs RA, Peltonen L, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. - PMC - PubMed
    1. Bryc K, Auton A, Nelson MR, et al. Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc. Natl. Acad. Sci. USA. 2010a;107:786–791. - PMC - PubMed
    1. Bryc K, Velez C, Karafet T, et al. Colloquium paper: Genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc. Natl. Acad. Sci. USA. 2010b;1072(Suppl.):8954–8961. - PMC - PubMed
    1. Byrnes JK, Rodríguez-Flores JL, Moreno-Estrada A, et al. Genomic Reconstruction of an Extinct Population from Next-Generation Sequence Data—Insights from the Taìno Genome Project; Platform presentatiom at International Congress of Human Genetics; 13 October 2011; Montreal, Canada. 2011.

Publication types