Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep;597(7877):522-526.
doi: 10.1038/s41586-021-03902-8. Epub 2021 Sep 22.

Paths and timings of the peopling of Polynesia inferred from genomic networks

Affiliations

Paths and timings of the peopling of Polynesia inferred from genomic networks

Alexander G Ioannidis et al. Nature. 2021 Sep.

Abstract

Polynesia was settled in a series of extraordinary voyages across an ocean spanning one third of the Earth1, but the sequences of islands settled remain unknown and their timings disputed. Currently, several centuries separate the dates suggested by different archaeological surveys2-4. Here, using genome-wide data from merely 430 modern individuals from 21 key Pacific island populations and novel ancestry-specific computational analyses, we unravel the detailed genetic history of this vast, dispersed island network. Our reconstruction of the branching Polynesian migration sequence reveals a serial founder expansion, characterized by directional loss of variants, that originated in Samoa and spread first through the Cook Islands (Rarotonga), then to the Society (Tōtaiete mā) Islands (11th century), the western Austral (Tuha'a Pae) Islands and Tuāmotu Archipelago (12th century), and finally to the widely separated, but genetically connected, megalithic statue-building cultures of the Marquesas (Te Henua 'Enana) Islands in the north, Raivavae in the south, and Easter Island (Rapa Nui), the easternmost of the Polynesian islands, settled in approximately AD 1200 via Mangareva.

PubMed Disclaimer

Conflict of interest statement

Competing interests C.D.B. is a member of the scientific advisory boards for Liberty Biosecurity, Personalis, 23andMe Roots into the Future, Ancestry.com, IdentifyGenomics, Genomelink, and Etalon and is a founder of CDB Consulting. C.R.G. owns stock in 23andMe and is member of the scientific advisory board for Encompass Bioscience.

Figures

Extended Data Figure 1
Extended Data Figure 1. Comparison of genetic and geographic coordinates for European vs. Polynesian samples
(a) A principal component analysis of samples from Europe (15 from each nation) is shown to closely fit the geography of Europe. (See Extended Data Figure 2 for a quantitative comparison.) (b) A principal component analysis of samples from Polynesia (with non-Polynesian ancestry masked) is shown not to match the vast geography of the Pacific (c), and instead splits out island groups one at a time, reflecting the founder effects that dominate the variance of these populations.
Extended Data Figure 2
Extended Data Figure 2. Permutation test for fit between genetic and geographic coordinates
100,000 random permutations of the population labels were created for the European populations’ genetic data (blue, left) vs. the Polynesian populations (orange, right). For the European populations, out of 100,000 random permutations of the population labels on the genetic PCA, none better fits the geography of Europe (after fitting using a Procrustes analysis), than the correct labels, showing that the genetic coordinates of Europeans fit the geographic coordinates of Europe better than chance every time. However, for the Polynesian data 5% of the random permutations of the labels on the genetic PCA fit the geographic coordinates of the Pacific islands better (after fitting using Procrustes), showing that the genetic data in Polynesia does not fit Polynesia’s geography better than random chance. In the box and whiskers plots the mean and upper and lower quartiles of the rms error of the fits of the random permutations of population labels are indicated by horizontal lines. The fits of the actual population labels are indicated by asterisks.
Extended Data Figure 3
Extended Data Figure 3. Continuity between ancient and modern Polynesian island populations
F3 statistics were computed between ancient Rapanui samples and the Polynesian component from modern samples from each island in our dataset (top). Indigenous Austronesian language speakers from Taiwan (the Atayal) were used as an outgroup. The ancient Rapanui were found to be the most similar genetically to the modern Rapanui, indicating genetic continuity. A similar comparison was performed between the only other ancient samples from an island in our study, Tonga (bottom). Again, the modern Tongans appear most similar genetically; however, all islands downstream from Tonga in our inferred settlement path also share the same amount of genetic drift with the ancient Tongan samples (to within one standard error), as they should, since they are all descendants of these ancient Tongan sample according to our settlement reconstruction.
Extended Data Figure 4
Extended Data Figure 4. Statistics used for settlement path inference
All statistics are based on the Polynesian-specific aggregate SNP frequency vectors computed for each island from all sampled individuals. The number (n) of individuals used are given for each island in Supplementary Table 1. (A) Directionality index (ψ), used to define sets of potential parent islands, plotted for each island relative to Samoa (equivalent to the top row of the matrix in Fig. 2B). (B) Average number of pairwise differences (π), measuring genetic distance and used to select the closest of potential parents, plotted for each island relative to Rapa Nui. (C) F3 statistic, used to find additional shared genetic drift, plotted for each island relative to Rapa Nui, with Taiwan as an outgroup. Standard errors in A-C were determined by a block bootstrap analysis. (D) Exponential decay constant (λ) for the Polynesian-specific IBD fragment length distributions between all pairs of individuals from Rapa Nui and each plotted island. The λ values can be used to calculate the number of generations elapsed since each pair of island populations were joined. Error bars show 95% confidence intervals of the maximum likelihood estimates determined analytically from the Fisher Information.
Extended Data Figure 5
Extended Data Figure 5. Settlement map with candidate intermediate islands added
A reproduction of the map of Fig. 2a showing intermediate islands that are in the settlement path but not in our dataset that are possible candidates for explaining the additional shared drift observed in the corresponding colored settlement branches, that is, genetic drift shared between the child islands but not shared with the parent island. The additional shared drift of the Austral islands (Rimatara and Tubuai) with the Society islands (Tahiti) and Tuamotus (Palliser) beyond what they each share with their parental island (Rarotonga in the Cooks) could indicate that there exists a shared intermediate island in their settlement path that we do not have in our dataset, for instance Mangaia. Geological analyses of ancient tools found on Mangaia (green) have shown that it served as a connection between the Cook islands and remote eastern Polynesia, now uninhabited Nororotu (Maria Atoll) is also believed to have played a role as an intermediary island. Traditional histories give Raiatea (pink) and its surrounding islands a role in the settling of remote eastern Polynesia. Finally, linguistic studies have found connections between Marquesic languages (Marquesas and Mangareva) and the central Tuamotus (orange). North Marquesas, South Marquesas, and Mangareva share drift with one another beyond what they share with Palliser, the westernmost island group in the Tuamotus, which could indicate that these three populations shared a common settlement path eastward through some of the Tuamotu Archipelago before diverging. Another possible explanation for additional shared drift is the settlement of each child island from a common subpopulation within the parental island, such as from the same clan or village.
Extended Data Figure 6
Extended Data Figure 6. Effect of Phasing Errors on IBD Dates
IBD segments on the island of Rapa Nui were identified between all male X chromosomes. The log of the number of IBD segments (y axis) of a given genetic length (x axis) is plotted (orange; bottom left). The expected exponential decay of IBD segment lengths (linear semilog plot) is seen. The slope of this line (−0.161) is the exponential (decay) constant λ. Since the X chromosome is perfectly phased in men, because it is haploid, the identification of these IBD segments is unaffected by errors introduced through phasing algorithms. To quantify the effect of such errors, synthetic-female individuals were constructed by combining two male X chromosomes to make a diploid pair and to erase the phase information by recording only the genotype. The unphased diploid genotypes so constructed were phased and IBD segments were again identified and plotted (green; bottom right). The difference between the exponential decay constant (−0.166) of these statistically phased genotypes and the previous one is seen to be minor (top panel), amounting to three percent (3.01%), which corresponds to a difference of around 25 years for dates approximately eight hundred years ago (as in Polynesia). Uncertainty in the slope of the lines (equivalent to the uncertainty in the estimate exponential decay constant) is shaded.
Extended Data Figure 7
Extended Data Figure 7. Polynesian ancestry-specific shared drift ordination plot with principal curve
A principal coordinate analysis (PCoA) projection of the pairwise shared drift distances (the Polynesian ancestry-specific outgroup-F3) between each Pacific island population using Taiwan as an outgroup (Supplementary Fig. 12). This PCoA projection uses only the pairwise distance matrix and is fully unsupervised; that is, it does not presuppose that Rapa Nui is a terminal island along some settlement path. Nevertheless, it shows the same ordering as in Supplementary Fig. 9, confirming that Rapa Nui is indeed the terminal island in our dataset along the longest drift path, and confirming the drift ordering along that path. For further confirmation, a principal curve was also fit to the full dimensional space (Supplementary Fig. 12) and then projected into the two-dimensional PCoA space for visualization. The orthogonal projections of each island onto the principal curve are shown as thinner grey lines. This fully unsupervised principal curve confirms the visually apparent path from Island Southeast Asia (Sumatra, far right) through Samoa, Fiji, Tonga and ending in Raivavae, Mangareva, and Rapa Nui (far left) in that order (cf. migration map in Fig. 2a). This projection of the high dimensional principal curve does not double back on itself, showing that the apparent ordering in this projection is consistent with the original high dimensional ordering. Note that this principal curve is able to fit only one settlement path (the principal one, that is, the longest drift path), which ends in Rapa Nui. Other settlement paths that branch away from this principal (longest) path appear simply as clusters projected onto the principal curve, since islands on those paths share no further drift with the principal path. That is, islands settled along secondary branching paths appear as clusters lying very close to one another along the principal curve. For example, Rapa Iti, which branches off from Rarotonga separately from the main settlement path (Fig. 2a), appears here as coincident with Rarotonga along the principal curve. The eigenvalue for PC1 over the sum of eigenvalues is .997 and for PC 2 is .002 (all eigenvalues are non-negative).
Fig. 1.
Fig. 1.. Dimensionality reduction of genetic variation in Pacific islanders.
(a-c) Ancestry-specific PCA of islanders (with non-Asian derived ancestries, such as post-colonial European ancestry, masked) shows islands (a) diverging separately along each component (b, c), due to the independence of genetic drift from each island’s founder effect. Neither geography, nor settlement sequence can be discerned. The westernmost islands are omitted, as their greater diversity would otherwise dominate the first principal component (see Supplementary Fig. 2). The percent variance explained by each of the first four principal component dimensions is listed along each axis. Dots represent individuals, and colors represent islands. (d) Ancestry specific tSNE plot of all sampled islanders, providing superior separation of each island group. The ancestral western Pacific islands are on the left and the easternmost Polynesian island (Rapa Nui) on the right. Important patterns are now evident; for instance, Rarotonga and the Palliser group appear at the center of the eastern Polynesian islands while the other eastern islands radiate out from them, consistent with the settlement patterns we infer below. tSNE preserves local relationships, but not global relationships (between widely separated clusters).
Fig. 2.
Fig. 2.. Serial bottlenecks and relatedness define the settlement sequence and timings for the Polynesian islands.
(a) Inferred genetic-based map of Polynesian origins for the islands sampled in our study (not to scale). The direction, line width, and date for each arrow are based on inter-island statistics as described in the key and the text. For example, the widths of the arrows are inversely proportional to the value of the range expansion statistic (ψ) relative to Samoa. The order of arrow divergences indicates the order of shared drift amongst the child populations. Where they occur, these shared paths may indicate that one or more intermediate islands in the settlement sequence are missing from our dataset (Extended Data Fig. 5). This settlement sequence is consistent with a principal curve analysis (Extended Data Fig. 7). A sex-averaged generation time of 30 years was used, as found in several studies of pre-industrial populations (see Supplementary Discusion ‘On generation times and meiosis events’). Locations with prehistoric remains of megalithic statue building are also indicated (red asterisk). (b) The range expansion statistic (ψ) shows a steady increase in retained rare variant frequencies (genetic surfing) along paths of settlement as a result of each successive founder effect. Note that each matrix element is computed on a different SNP set (rare variants found in some samples from both islands), so the matrix need not have a similar ordering across all rows or all columns—that it does is a confirmation of the range expansion process. Rapa Nui (Easter Island) is the easternmost island in our dataset with the most compounded series of founder effects. (c) Example IBD segment length distributions for all pairs of individuals, one on Rapa Nui and the other on Mangareva (green), Palliser (blue), Rarotonga (red), and Samoa (black), used to fit the respective exponential decay constants (λ).

Comment in

References

    1. Low S. Hawaiki Rising: Hōkūle’a, Nainoa Thompson, and the Hawaiian Renaissance. (University of Hawaii Press, 2019).
    1. Kirch PV On the Road of the Winds. (University of California Press, 2017).
    1. Mulrooney MA, Bickler SH, Allen MS & Ladefoged TN High-precision dating of colonization and settlement in East Polynesia. Proc. Natl. Acad. Sci. U.S.A 108, E192–E194 (2011). - PMC - PubMed
    1. Schmid MME et al. How 14C dates on wood charcoal increase precision when dating colonization: The examples of Iceland and Polynesia. Quaternary Geochronology 48, 64–71 (2018).
    1. Kahōʻāliʻi Keauokalani K. Kepelino’s Traditions of Hawaii. Bernice P. Bishop Museum Bulletin; 206 (1932).

Publication types