Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(12):e1004023.
doi: 10.1371/journal.pgen.1004023. Epub 2013 Dec 26.

Reconstructing Native American migrations from whole-genome and whole-exome data

Affiliations

Reconstructing Native American migrations from whole-genome and whole-exome data

Simon Gravel et al. PLoS Genet. 2013.

Abstract

There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern American ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic of the data and analysis pipelines used in this article.
The three types of 1000 Genomes data are shown in orange: whole-genome, low-coverage data; exome capture; and genotyping chip. Only genotyping chip data was available in trio-phased form; for the other two datasets we used unphased genotypes. Among the analysis approaches (black arrows), the EM and the negative ascertainment analysis are novel: they are presented in the Methods section.
Figure 2
Figure 2. Genome-wide ancestry patterns.
(a) Individual ancestry proportions in the 1000 Genomes CLM, MXL, and PUR populations according to admixture, (b) Map showing the sampling locations for the populations most closely related to the Native components of the 1000 Genomes populations. (c) Principal component analysis restricted to genomic segments inferred to be of Native Ancestry in these populations, compared to a reference panel of Native American groups from , pooled according to country of origin as a proxy for geography. Populations sampled across many locations are labeled according to the country of the centroid of locations. (d) Zoomed version of the PCA plot, showing specific Native American population labels, colored according to country of origin.
Figure 3
Figure 3. Ancestry tract length distribution in PUR (a) and CLM (b) compared to the predictions of the best-fitting migration model.
Solid lines represent model predictions and shaded areas are one standard deviation confidence regions surrounding the predictions, assuming a Poisson distribution of counts per bin. The best-fitting models are displayed under each graph. Pie charts sizes indicate the proportion of migrants at each generation, and the pie parts represent the fraction of migrants of each origin at a given generation. Migrants are taken to have uniform continental ancestry. ‘Single-pulse’ admixture events occurring at non integer time in generations are distributed among neighboring generations: in the CLM, the inferred onset was 13.02 generations ago (ga). The model involves founding 14 ga, but almost complete replacement 13 ga. At 30 years per generation , 14.9 ga corresponds to formula image, and 13 to formula image. Model parameters and confidence intervals are displayed in Table S1 in the Text S1 file.
Figure 4
Figure 4. Number of IBD tracts by length bin in the three panel populations (independent of ancestry estimations), normalized by the number of individual pairs.
The lower level of IBD in the MXL population indicate a much larger effective population size.
Figure 5
Figure 5. Continental origin of IBD segments.
(a) Local ancestry assignments in the neighborhood of the 120 longest inferred IBD segments within a population, (b) Local ancestry assignments in the neighborhood of the 120 longest inferred IBD segments across populations. Within inferred IBD segments, ancestry mismatches correspond formula image error rate within population, and formula image error rate across population.
Figure 6
Figure 6. An illustration of the maximum likelihood demographic model for the Native American ancestors to the CLM, MXL, and PUR panels.
Parameter values are provided in Table 1. The ordering of the split shown (i.e., MXL splitting first) maximized the likelihood, but among the bootstrap replicates all three orders were observed.
Figure 7
Figure 7. Plausible parameter range for the human mutation rate and the founding time of the Native American populations.
The shaded blue area is the formula image confidence interval from the current analysis. The horizontal line shows the lowest mutation rate estimate from , and the vertical line shows the lowest plausible date for the founding of the ancestral Native American populations according to . The plausible region, given by the overlap of the three areas, would correspond to a mutation rate of formula image and a Native American founding time formula image.
Figure 8
Figure 8. Estimating Native American allele frequencies.
(a) Number of inferred Native American haplotypes per site, out of 120 CLM, 132 MXL, and 110 PUR haplotypes. (b) Distribution of confidence intervals widths for allele frequency estimations among the exomic Native American segments of the three panels.
Figure 9
Figure 9. Illustration of the negative ascertainment scheme, with simulation.
(a) A basic three population model, showing the joint site-frequency spectrum for populations 1 and 2 as a heat map. (b) Conditioning on variants not being observed in the out-population results in a SFS skewed towards rare variants. (c) A quantitatively similar effect can be obtained by introducing a drastic bottleneck at the root of the tree and considering only two populations.

References

    1. Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65. - PMC - PubMed
    1. O'Rourke DH, Raff JA (2010) The human genetic history of the Americas: the final frontier. Curr Biol 20: R202–R207. - PubMed
    1. Luis Lanata J, Martino L, Osella A, Garcia-Herbst A (2008) Demographic conditions necessary to colonize new spaces: the case for early human dispersal in the Americas. World Archaeology 40: 520–537.
    1. Goebel T, Waters MR, O'Rourke DH (2008) The late Pleistocene dispersal of modern humans in the Americas. Science 319: 1497–1502. - PubMed
    1. Dillehay TD (2009) Probing deeper into first American studies. Proc Natl Acad Sci USA 106: 971–978. - PMC - PubMed

Publication types