Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun;570(7760):236-240.
doi: 10.1038/s41586-019-1251-y. Epub 2019 Jun 5.

Palaeo-Eskimo genetic ancestry and the peopling of Chukotka and North America

Affiliations

Palaeo-Eskimo genetic ancestry and the peopling of Chukotka and North America

Pavel Flegontov et al. Nature. 2019 Jun.

Abstract

Much of the American Arctic was first settled 5,000 years ago, by groups of people known as Palaeo-Eskimos. They were subsequently joined and largely displaced around 1,000 years ago by ancestors of the present-day Inuit and Yup'ik1-3. The genetic relationship between Palaeo-Eskimos and Native American, Inuit, Yup'ik and Aleut populations remains uncertain4-6. Here we present genomic data for 48 ancient individuals from Chukotka, East Siberia, the Aleutian Islands, Alaska, and the Canadian Arctic. We co-analyse these data with data from present-day Alaskan Iñupiat and West Siberian populations and published genomes. Using methods based on rare-allele and haplotype sharing, as well as established techniques4,7-9, we show that Palaeo-Eskimo-related ancestry is ubiquitous among people who speak Na-Dene and Eskimo-Aleut languages. We develop a comprehensive model for the Holocene peopling events of Chukotka and North America, and show that Na-Dene-speaking peoples, people of the Aleutian Islands, and Yup'ik and Inuit across the Arctic region all share ancestry from a single Palaeo-Eskimo-related Siberian source.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicting financial interests.

Figures

Extended Data Figure 1:
Extended Data Figure 1:. Geographic locations of Siberian and North American populations used in this study.
Three main datasets are as follows (Supplementary Tables 4, 5): 1) a set based on the Affymetrix Human Origins genotyping array, including alternatively pseudo-haploid or diploid genotypes for the ancient Saqqaq individual, diploid genotypes for the ancient Clovis individual, together with 1240K SNP capture pseudo-haploid data from six ancient Aleuts who had the highest coverage, two unrelated ancient Athabaskans, 19 ancient Chukotkan Old Bering Sea individuals from the Ekven and Uelen sites, the Middle Dorset and Late Dorset Paleo-Eskimo individuals, and the ancient Ust’-Belaya Angara population of 9 individuals (Supplementary Table 1); 2) a set based on various Illumina arrays, including Saqqaq and the other ancient samples, and 3) a whole genome data set of 190 individuals from 87 populations, including the Saqqaq individual, one ancient Athabaskan individual (I5319), and one ancient Aleut individual (I0719), for which we generated complete genomes with 6.1x and 2.3x coverage, respectively (Supplementary Table 1). The dataset composition, i.e. number of individuals in each meta-population, is shown in the table on the right. Locations of samples with whole genome sequencing data (SEQ) are shown with circles, and those of Illumina (ILL) and HumanOrigins (HO) SNP array samples with triangles and diamonds, respectively. Meta-populations are color-coded in a similar way throughout all figures and designated as follows: Na-Dene speakers (abbreviated as ATH), other northern Native Americans, alternatively named First Peoples (NAM), Southern First Peoples (SAM), Basal First Peoples (BAM), Eskimo-Aleut speakers (E-A), Chukotko-Kamchatkan speakers (C-K), Paleo-Eskimos (P-E), West and East Siberians (WSIB and ESIB), Southeast Asians (SEA), Europeans (EUR), and Africans (AFR). Locations of the Saqqaq, Dorset and other ancient samples are shown as stars colored to reflect their meta-population affiliation.
Extended Data Figure 2.
Extended Data Figure 2.. Principal component analysis (PCA) based on the Illumina dataset.
A plot of two principal components (PC1 vs. PC2) calculated by PLINK2 is shown (linkage disequilibrium pruning was not applied). No outliers were excluded for this analysis based on 642 individuals and 524,830 loci. The following meta-populations most relevant for our study are plotted: present-day Eskimo-Aleut and Chukotko-Kamchatkan speakers, ancient Chukotkan Neo-Eskimos (Ekven and Uelen sites), ancient Aleuts, Paleo-Eskimos (the Saqqaq, Middle Dorset and Late Dorset individuals), ancient Northern Athabaskans, present-day Na-Dene speakers, northern and Southern First Peoples, West and East Siberians, the Ust’-Belaya Angara ancient Siberian population, Southeast Asians, and Europeans. Calibrated radiocarbon dates in YBP are shown for ancient samples. For individuals, 95% confidence intervals are shown, and for populations, minimal and maximal median dates among individuals are shown.
Extended Data Figure 3.
Extended Data Figure 3.. Ancestry proportions in American, Chukotkan and Kamchatkan populations.
Shown are the HumanOrigins (a-e) and Illumina (f-j) datasets without transition polymorphisms. Five alternative outgroup sets are indicated below the plots and described in detail in Methods and in Supplementary Information section 5. Target populations in bold denote ancient populations. Saqqaq (pseudo-haploid genotype calls) was considered as a Paleo-Eskimo source for all populations apart from Saqqaq itself, for which Late Dorset was used as a source, and alternative First American sources were as follows: Mixe, Guarani, or Karitiana for the HumanOrigins dataset; Nisga’a, Mixtec, Pima, or Karitiana for the Illumina dataset. To visualize both systematic and statistical errors, ancestry proportions inferred by qpAdm and their standard errors are shown for all triplets including these different First Peoples sources, or for many alternative target populations in the case of Southern First Peoples (single standard error intervals are plotted here). Asterisks stand for ancestry proportions >150% (inappropriate models). Meta-populations are color-coded and abbreviated as follows: C-K, Chukotko-Kamchatkan speakers; E-A, Eskimo-Aleut speakers and ancient Neo-Eskimos and ancient Aleuts; N-D, Na-Dene speakers; NAM, Northern First Peoples; SAM, Southern First Peoples. Target population sizes in the HumanOrigins dataset ranged from 1 to 23 individuals, 5.6 on average, and in the Illumina dataset they ranged from 1 to 16 individuals, 5.1 on average.
Extended Data Figure 4.
Extended Data Figure 4.. Ancestry proportions in American, Chukotkan and Kamchatkan populations.
Similar analysis as in Extended Data Fig. 3, but including transition polymorphisms. Target population sizes in the HumanOrigins dataset (a-e) ranged from 1 to 23 individuals, 5.6 on average, and in the Illumina dataset (f-j) they ranged from 1 to 16 individuals, 5.1 on average.
Extended Data Figure 5.
Extended Data Figure 5.. Relative Saqqaq, Arctic, and European haplotype sharing statistics (HSS) for American individuals.
Results are shown for the Human Origins (a) and Illumina (b) datasets, normalized using the African meta-population. Both Eskimo-Aleut- and Chukotko-Kamchatkan-speaking groups contributed to the Arctic HSS. The same statistics and statistics with other normalizers are shown in the form of two-dimensional plots in Supplementary Information section 6. Two Dakelh (Northern Athabaskan) individuals with whole-genome sequencing data were included in both datasets and marked by asterisks. The plots based on both datasets demonstrate that Na-Dene speakers have the highest relative Saqqaq HSS. One Haida and three Splatsin individuals also demonstrate outlying Saqqaq HSSs (b), however these individuals stand in contrast to a majority of non-Na-Dene Northern First Peoples, and Paleo-Eskimo ancestry in these individuals may be explained by recent interaction with Na-Dene speakers living in close proximity. The Haida outlier demonstrates a maximal Arctic HSS among all First Peoples, and its Arctic ancestry has contributed to its elevated Saqqaq HSS. Saqqaq, Arctic and European statistics are largely uncorrelated in First Peoples: Pearson’s correlation coefficients for Saqqaq vs. Arctic relative HSSs are 0.56 among all First Peoples and 0.64 among Northern First Peoples in the case of the Illumina dataset, and 0.66 and 0.72, respectively, in the case of the HumanOrigins dataset.
Extended Data Figure 6.
Extended Data Figure 6.. Rare allele sharing analysis.
A two-dimensional plot of Chukotko-Kamchatkan (C-K) and Siberian (SIB) rare allele sharing statistics for First Peoples, Na-Dene-speaking, Eskimo-Aleut-speaking, and Paleo-Eskimo individuals. Rare alleles occurring from 2 to 5 times in the reference set of 238 haploid genomes (0.8–2.1% frequency) contributed to the statistics; the Chukchi individual was dropped from the C-K reference group, and the transversion-only dataset was used. Thus, this analysis was based on 918,474 loci. The sample size for this analysis equals 238 + 2 haploid genomes in a target individual since individuals were analyzed separately. Standard deviations were calculated using a jackknife approach with chromosomes used as resampling blocks. Single standard error intervals and means are plotted. Populations and meta-populations are color-coded according to the legend. Rare allele sharing statistics for simulated mixtures of any present-day southern Native American individual and the Saqqaq individual (from 5% to 75% Saqqaq ancestry, with 5% increments) are plotted as semi-transparent pink circles. Plots for the 2 to 10 allele frequency range and other versions are shown in Supplementary Information section 8.
Extended Data Figure 7.
Extended Data Figure 7.. An admixture graph connecting various modern meta-populations and ancient populations or individuals.
As derived in Supplementary Information section 10, the graph features a simplified three-component model for Europeans as previously suggested and two gene flows from a European lineage related to the ancient Siberian genome MA-1 into Native Americans and Siberians. The topology within the proto-Paleo-Eskimo clade was obtained by cycling through dozens of trees with all possible topologies of branches and admixture edges and selecting the one with the highest support and no 0-length edges within the proto-Paleo-Eskimo clade.
Extended Data Figure 8.
Extended Data Figure 8.. ADMIXTURE analysis.
Shown are results for the HumanOrigins (a) and Illumina (b) SNP array datasets. The number of source populations in ADMIXTURE is 14 and 11, respectively. One hundred iterations were calculated for each value of K from 5 to 20 (where K is the number of ancestral populations), and the optimal K values were selected based on ten-fold cross-validation. Contributions from hypothetical ancestral populations are color-coded, and meta-populations used in this study are indicated above the plot: AFR, Africans; EUR, Europeans; SEA, Southeast Asians; ESIB, East Siberians; WSIB, West Siberians; C-K, Chukotko-Kamchatkan speakers; E-A, Eskimo-Aleut speakers; NAM, northern First Peoples; SAM, Southern First Peoples; ATH, Northern Athabaskan speakers; N-D, Na-Dene speakers. Chipewyan or Northern Athabaskan and Tlingit individuals with European admixture are plotted in separate bars, as well as ancient individuals: Clovis, Northern Athabaskans, Aleuts, Chukotkan Neo-Eskimos (Ekven and Uelen sites), Saqqaq and Late Dorset Paleo-Eskimos, and a genetically heterogeneous Ust’-Belaya Angara Siberian population (Ust’-Belaya WSIB, an undated individual I7760 having a West Siberian genetic profile according to PCA and this ADMIXTURE analysis; Ust’-Belaya, the remaining 8 individuals from the Ust’-Belaya Angara site having a distinct genetic profile according to our PCA analysis). Outliers, including individuals admixed with Europeans and East Asians, were not removed from Na-Dene-speaking populations in the Illumina dataset (b) to preserve their maximal diversity. Outliers were removed for the purpose of other analyses (qpAdm, f4-statistics, etc.) that rely on pre-defined populations.
Extended Data Figure 9.
Extended Data Figure 9.. Clustering trees of individuals computed by fineSTRUCTURE.
The trees are based on coancestry matrices of counts of shared haplotypes. Reduced versions of the HumanOrigins (a) and Illumina (b) SNP array datasets were used (Supplementary Table 5), including only the following meta-populations most relevant for our study: Eskimo-Aleut speakers (E-A), Chukotko-Kamchatkan speakers (C-K), Na-Dene speakers (ATH), northern First Americans or First Peoples (NAM), Southern First Peoples (SAM), West Siberians (WSIB), East Siberians (ESIB), Southeast Asians (SEA), Europeans (EUR). Meta-population affiliation is color-coded for individuals. Iñupiat individuals genotyped in this study are marked with a blue line. The two Dakelh (Northern Athabaskan) individuals with sequenced genomes and the ancient individuals, Clovis within the Southern First Peoples clade and Saqqaq within the Chukotko-Kamchatkan clade, are also indicated. Most members of each clade belong to the meta-populations indicated, with a few exceptions. First (see panel a), Altaians fall into the ESIB clade, some Chilote fall into the NAM, and Aleuts fall into the WSIB clades (two latter cases might be explained by extensive European ancestry in Chilote and in Aleuts (Extended Data Fig. 8a) which drives this clustering). Second (see panel b), some Selkups fall into the ESIB clade, all four Southern Athabaskan speakers cluster with South Americans, reflecting their substantial South American ancestry (Extended Data Fig. 8b), one Haida individual clusters with Na-Dene speakers, and five Northern Athabaskan speakers cluster with other Northern First Peoples.
Figure 1.
Figure 1.. Principal component analysis (PCA) and qpAdm modelling.
a) The first two PCs for 940 individuals from the HumanOrigins dataset are plotted. No outliers were excluded for this analysis based on 586,487 loci. Calibrated radiocarbon dates (calBP) are shown for ancient samples (95% confidence intervals for individuals, minimal and maximal average dates for groups). See Extended Data Fig. 2 and Supplementary Information section 4 for PCA plots of additional datasets. b) Proportions of Paleo-Eskimo ancestry inferred by qpAdm, using the same dataset as in a) but without transition polymorphisms. To visualize both systematic and statistical errors, for each target group ancestry proportions and their single standard error intervals are shown for population triplets including different First Peoples ancestry sources, or for many alternative target populations in the case of Southern First Peoples. Target population sizes ranged from 1 to 23 individuals, with 5.6 on average.
Figure 2.
Figure 2.. A demographic model based on 114 individuals from 9 meta-populations.
a) We used Rarecoal and qpGraph to test topologies and estimate split times and admixture edges (dashed). For a complete list of parameter estimates, including confidence intervals, see Supplementary Information section 9. b) A zoomed-in model for the last 6,000 years and 5 populations, highlighting the Holocene migrations and gene flow events between Asia and America. Maximum likelihood branching points of ancient genomes are indicated as solid dots. Times are scaled using a per-generation mutation rate of 1.25×10−8 and a generation time of 29 years (see Supplementary Information section 9).
Figure 3.
Figure 3.. Archaeological and geographical interpretation of our model.
a) The topology drawn here reflects our best fitting-model of the proto-Paleo-Eskimo clade. The Paleo-Eskimo/Na-Dene gene flow we provisionally mapped across the boundary separating the ASTt and Northern Archaic cultures in Alaska, where the highest diversity of Na-Dene languages is found (for that reason Alaska was proposed as a Na-Dene homeland). b) A model of population history for Eskimo-Aleut (E-A) speakers combining genetic and archaeological evidence. Their back-and-forth movement across the Bering Strait is illustrated, as well as the bidirectional gene flow between Yup’ik and Inuit ancestors (the Old Bering Sea culture, OBS) and Chukotko-Kamchatkan (C-K) speakers in Chukotka. In both panels, earliest dates in calBP are indicated for archaeological areas and migrations. Some migration paths are drawn to indicate general directions, but not actual routes of population spread.Methods

Comment in

References

    1. Rasmussen M et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010). - PMC - PubMed
    1. Raghavan M et al. The genetic prehistory of the New World Arctic. Science 345, 1255832 (2014). - PubMed
    1. Friesen TM Pan-Arctic population movements: the early Paleo-Inuit and Thule Inuit migrations The Oxford Handbook of the Prehistoric Arctic, ed. Friesen TM, Mason OK New York: Oxford University Press; 673–692 (2016).
    1. Reich D et al. Reconstructing Native American population history. Nature 488, 370–374 (2012). - PMC - PubMed
    1. Raghavan M et al. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349, 1–20 (2015). - PMC - PubMed

Additional references for Methods:

    1. Flegontov P et al. Genomic study of the Ket: A Paleo-Eskimo-related ethnic group with significant ancient North Eurasian ancestry. Sci. Rep 6, 20768 (2016). - PMC - PubMed
    1. Mallick S et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016). - PMC - PubMed
    1. Li H & Durbin R Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011). - PMC - PubMed
    1. Rasmussen M et al. The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature 506, 225–229 (2014). - PMC - PubMed
    1. Raghavan M et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014). - PMC - PubMed

Additional references for Extended Data Figures

    1. Verdu P et al. Patterns of admixture and population structure in native populations of northwest North America. PLoS Genet. 10, e1004530 (2014). - PMC - PubMed

Publication types