Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 25;375(6583):eabi8264.
doi: 10.1126/science.abi8264. Epub 2022 Feb 25.

A unified genealogy of modern and ancient genomes

Affiliations

A unified genealogy of modern and ancient genomes

Anthony Wilder Wohns et al. Science. .

Abstract

The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. However, the problem of how best to characterize ancestral relationships from the totality of human genomic variation remains unsolved. Here, we address this challenge with nonparametric methods that enable us to infer a unified genealogy of modern and ancient humans. This compact representation of multiple datasets explores the challenges of missing and erroneous data and uses ancient samples to constrain and date relationships. We demonstrate the power of the method to recover relationships between individuals and populations as well as to identify descendants of ancient samples. Finally, we introduce a simple nonparametric estimator of the geographical location of ancestors that recapitulates key events in human history.

PubMed Disclaimer

Conflict of interest statement

Competing interests: G.M. is a director of and shareholder in Genomics plc and a partner in Peptide Groove LLP.

Figures

Fig. 1.
Fig. 1.. Schematic overview and validation of the inference methodology.
(A) An example tree sequence topology with four samples (nodes 0-3), two marginal trees, four ancestral haplotypes (nodes 4-7), and two mutations. Tspan measures the genomic span of each marginal tree topology, with the dotted line indicating the location of a recombination event. The graph representation is equivalent to the tree representation. (B) Schematic representation of the inference methodology. Step 0: alleles are ordered by frequency; the mutation represented by the four-point star is considered to be older. Step 1: the tree sequence topology is inferred with tsinfer using modern samples. Step 2: the tree sequence is dated with tsdate. Step 3: node date estimates are constrained with the known age of ancient samples. Step 4: ancestral haplotypes are reordered by the estimated age of their focal mutation; the five pointed star mutation is now inferred to be older. The algorithm returns to Step 1 to re-infer the tree sequence topology with ancient samples. Arrows refer to modes of operation: Steps 0, 1 and 2 only (red); Steps 0, 1, 2, 4, 1, and 2 (green) and Steps 0, 1, 2, 3, 4, 1, 2 (blue) (24). (C) Scatter plots and accuracy metrics comparing simulated (x-axis) and inferred (y-axis) mutation ages from msprime neutral coalescent simulations, using tsdate with the simulated topology (left) and inferred topology from tsinfer (right). (D) Accuracy metrics, root-mean squared log error (top) and Spearman rank correlation coefficient (bottom), with modern samples only (first panel), after one round of iteration (second panel) and with increasing numbers of ancient samples (colored arrows as in panel B). Ancient samples from three eras of human history are considered as in the schematic (24).
Fig. 2.
Fig. 2.. Clustered heatmap showing the average time to the most recent common ancestor (TMRCA) on chromosome 20 for haplotypes within pairs of the 215 populations in the HGDP, TGP, SGDP, and ancient samples.
Each cell in the heatmap is colored by the logarithmic mean TMRCA of samples from the two populations. Hierarchical clustering of rows and columns has been performed using the UPGMA algorithm on the value of the pairwise average TMRCAs. Row colors are given by the region of origin for each population, as shown in the legend. The source of genomic samples for each population is indicated in the shaded boxes above the column labels. Three population relationships are highlighted using span-weighted histograms of the TMRCA distributions: (A) average distribution of TMRCAs between all non-African populations (black line) compared to African/African TMRCAs (solid yellow). (B) Denisovan and Papuan/Australian TMRCAs (solid line), compared to the Denisovan against all non- Archaic populations (solid white). This subtle but unique signal of elevated recent ancestry between the Denisovan and Papuans/Australians is particularly evident in Interactive fig. S1 at https://awohns.github.io/unified_genealogy/interactive_figure.html. (C) TMRCAs between the two Samaritan chromosomes (solid line), compared to the Samaritans/all other modern humans (solid white). Selected populations with particularly recent within-group TMRCAs are indicated. Duplicate samples appearing in more than one modern dataset are included in this analysis. Interactive Figure S1 is an interactive version of this figure and is available at: https://awohns.github.io/unified_genealogy/interactive_figure.html.
Fig.3.
Fig.3.. Validation of inference methods using ancient samples.
(A) Comparison of mutation age estimates from tsdate, Relate and GEVA with 3,734 ancient samples at 76,889 variants on chromosome 20 (note that Relate estimates ages separately for each population in which a variant is found). The radiocarbon- dated age of the oldest ancient sample carrying a derived allele at each variant site in the 1000 Genomes Project is used as the lower bound on the age of the mutation (diagonal lines). Mutations below this line have an estimated age that is inconsistent with the age of the ancient sample. Black lines on each plot show the moving average of allele age estimates from each method as a function of oldest ancient sample age. Plots to the left show the distribution of allele age estimates for modern-only variants from each respective method. Additional metrics are reported in each plot. (B) Percentage of chromosome 20 for modern samples in each region that is inferred to descend from Denisovan haplotypes, calculated with the genomic descent statistic (57). (C) Tracts of descent along chromosome 20 descending from Denisovan haplotypes in modern samples with at least 100 kilobases (kb) of total descent (colors as in Fig. 2).
Fig. 4.
Fig. 4.. Visualization of the non-parametric estimator of ancestor geographic location for HGDP, SGDP, Neanderthal, Denisovan, and Afanasievo samples on chromosome 20.
(A) Geographic location of samples used to infer ancestral geography. The size of each symbol is proportional to the number of samples in that population. (B) The average location of the ancestors of each HGDP population from time t=0 to ~2 million years ago. The width of lines is proportional to the number of ancestors of each population over time. The ancestor of a population is defined as an inferred ancestral haplotype with at least one descendant in that population. (C) 2d-histograms showing the inferred geographical location of HGDP ancestral lineages at six time-points. Histogram bins with fewer than 10 ancestors are not shown. Note that the geographic concentration of ancestors at more recent times is an artifact of uneven sampling and our geographic inference method.

Comment in

  • Inferring human evolutionary history.
    Rees J, Andrés A. Rees J, et al. Science. 2022 Feb 25;375(6583):817-818. doi: 10.1126/science.abo0498. Epub 2022 Feb 24. Science. 2022. PMID: 35201893
  • Human genealogical histories.
    Tang L. Tang L. Nat Methods. 2022 Apr;19(4):400. doi: 10.1038/s41592-022-01471-w. Nat Methods. 2022. PMID: 35396479 No abstract available.

References

    1. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, Marchini J, The UK Biobank resource with deep phenotyping and genomic data. Nature. 562, 203–209 (2018). - PMC - PubMed
    1. Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, Pitsillides AN, LeFaive J, Lee S, Tian X, Browning BL, Das S, Emde A-K, Clarke WE, Loesch DP, Shetty AC, Blackwell TW, Smith AV, Wong Q, Liu X, Conomos MP, Bobo DM, Aguet F, Albert C, Alonso A, Ardlie KG, Arking DE, Aslibekyan S, Auer PL, Barnard J, Barr RG, Barwick L, Becker LC, Beer RL, Benjamin EJ, Bielak LF, Blangero J, Boehnke M, Bowden DW, Brody JA, Burchard EG, Cade BE, Casella JF, Chalazan B, Chasman DI, Chen Y-DI, Cho MH, Choi SH, Chung MK, Clish CB, Correa A, Curran JE, Custer B, Darbar D, Daya M, de Andrade M, DeMeo DL, Dutcher SK, Ellinor PT, Emery LS, Eng C, Fatkin D, Fingerlin T, Forer L, Fornage M, Franceschini N, Fuchsberger C, Fullerton SM, Germer S, Gladwin MT, Gottlieb DJ, Guo X, Hall ME, He J, Heard-Costa NL, Heckbert SR, Irvin MR, Johnsen JM, Johnson AD, Kaplan R, Kardia SLR, Kelly T, Kelly S, Kenny EE, Kiel DP, Klemmer R, Konkle BA, Kooperberg C, Köttgen A, Lange LA, Lasky-Su J, Levy D, Lin X, Lin K-H, Liu C, Loos RJF, Garman L, Gerszten R, Lubitz SA, Lunetta KL, Mak ACY, Manichaikul A, Manning AK, Mathias RA, McManus DD, McGarvey ST, Meigs JB, Meyers DA, Mikulla JL, Minear MA, Mitchell BD, Mohanty S, Montasser ME, Montgomery C, Morrison AC, Murabito JM, Natale A, Natarajan P, Nelson SC, North KE, O’Connell JR, Palmer ND, Pankratz N, Peloso GM, Peyser PA, Pleiness J, Post WS, Psaty BM, Rao DC, Redline S, Reiner AP, Roden D, Rotter JI, Ruczinski I, Sarnowski C, Schoenherr S, Schwartz DA, Seo J-S, Seshadri S, Sheehan VA, Sheu WH, Shoemaker MB, Smith NL, Smith JA, Sotoodehnia N, Stilp AM, Tang W, Taylor KD, Telen M, Thornton TA, Tracy RP, Van Den Berg DJ, Vasan RS, Viaud-Martinez KA, Vrieze S, Weeks DE, Weir BS, Weiss ST, Weng L-C, Willer CJ, Zhang Y, Zhao X, Arnett DK, Ashley-Koch AE, Barnes KC, Boerwinkle E, Gabriel S, Gibbs R, Rice KM, Rich SS, Silverman EK, Qasba P, Gan W, Abe N, Almasy L, Ament S, Anderson P, Anugu P, Applebaum-Bowden D, Assimes T, Avramopoulos D, Barron-Casella E, Beaty T, Beck G, Becker D, Beitelshees A, Benos T, Bezerra M, Bis J, Bowler R, Broeckel U, Broome J, Bunting K, Bustamante C, Buth E, Cardwell J, Carey V, Carty C, Casaburi R, Castaldi P, Chaffin M, Chang C, Chang Y-C, Chavan S, Chen B-J, Chen W-M, Chuang L-M, Chung R-H, Comhair S, Cornell E, Crandall C, Crapo J, Curtis J, Damcott C, David S, Davis C, de las Fuentes L, DeBaun M, Deka R, Devine S, Duan Q, Duggirala R, Durda JP, Eaton C, Ekunwe L, El Boueiz A, Erzurum S, Farber C, Flickinger M, Frazar C, Fu M, Fulton L, Gao S, Gao Y, Gass M, Gelb B, Geng XP, Geraci M, Ghosh A, Gignoux C, Glahn D, Gong D-W, Goring H, Graw S, Grine D, Gu CC, Guan Y, Gupta N, Haessler J, Hawley NL, Heavner B, Herrington D, Hersh C, Hidalgo B, Hixson J, Hobbs B, Hokanson J, Hong E, Hoth K, Hsiung CA, Hung Y-J, Huston H, Hwu CM, Jackson R, Jain D, Jhun MA, Johnson C, Johnston R, Jones K, Kathiresan S, Khan A, Kim W, Kinney G, Kramer H, Lange C, Lange E, Lange L, Laurie C, LeBoff M, Lee J, Lee SS, Lee W-J, Levine D, Lewis J, Li X, Li Y, Lin H, Lin H, Lin KH, Liu S, Liu Y, Liu Y, Luo J, Mahaney M, N. T.-O. for P. M. (TOPMed) Consortium, Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 590, 290–299 (2021). - PMC - PubMed
    1. Reich D, Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past (Oxford University Press, Oxford, UK, 2018).
    1. Lewin HA, Robinson GE, Kress WJ, Baker WJ, Coddington J, Crandall KA, Durbin R, Edwards SV, Forest F, Gilbert MTP, Goldstein MM, Grigoriev IV, Hackett KJ, Haussler D, Jarvis ED, Johnson WE, Patrinos A, Richards S, Castilla-Rubio JC, van Sluys M-A, Soltis PS, Xu X, Yang H, Zhang G, Earth BioGenome Project: Sequencing life for the future of life. Proc. Natl. Acad. Sci 115, 4325–4333 (2018). - PMC - PubMed
    1. Lazaridis, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, Berger B, Economou C, Bollongino R, Fu Q, Bos KI, Nordenfelt S, Li H, de Filippo C, Prüfer K, Sawyer S, Posth C, Haak W, Hallgren F, Fornander E, Rohland N, Delsate D, Francken M, Guinet J-M, Wahl J, Ayodo G, Babiker HA, Bailliet G, Balanovska E, Balanovsky O, Barrantes R, Bedoya G, Ben-Ami H, Bene J, Berrada F, Bravi CM, Brisighelli F, Busby GBJ, Cali F, Churnosov M, Cole DEC, Corach D, Damba L, van Driem G, Dryomov S, Dugoujon J-M, Fedorova SA, Gallego Romero I, Gubina M, Hammer M, Henn BM, Hervig T, Hodoglugil U, Jha AR, Karachanak-Yankova S, Khusainova R, Khusnutdinova E, Kittles R, Kivisild T, Klitz W, Kučinskas V, Kushniarevich A, Laredj L, Litvinov S, Loukidis T, Mahley RW, Melegh B, Metspalu E, Molina J, Mountain J, Näkkäläjärvi K, Nesheva D, Nyambo T, Osipova L, Parik J, Platonov F, Posukh O, Romano V, Rothhammer F, Rudan I, Ruizbakiev R, Sahakyan H, Sajantila A, Salas A, Starikovskaya EB, Tarekegn A, Toncheva D, Turdikulova S, Uktveryte I, Utevska O, Vasquez R, Villena M, Voevoda M, Winkler CA, Yepiskoposyan L, Zalloua P, Zemunik T, Cooper A, Capelli C, Thomas MG, Ruiz-Linares A, Tishkoff SA, Singh L, Thangaraj K, Villems R, Comas D, Sukernik R, Metspalu M, Meyer M, Eichler EE, Burger J, Slatkin M, Pääbo S, Kelso J, Reich D, Krause J, Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 513, 409–413 (2014). - PMC - PubMed

Publication types