Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 13;19(2):e1010410.
doi: 10.1371/journal.pgen.1010410. eCollection 2023 Feb.

Bayesian inference of admixture graphs on Native American and Arctic populations

Affiliations

Bayesian inference of admixture graphs on Native American and Arctic populations

Svend V Nielsen et al. PLoS Genet. .

Abstract

Admixture graphs are mathematical structures that describe the ancestry of populations in terms of divergence and merging (admixing) of ancestral populations as a graph. An admixture graph consists of a graph topology, branch lengths, and admixture proportions. The branch lengths and admixture proportions can be estimated using numerous numerical optimization methods, but inferring the topology involves a combinatorial search for which no polynomial algorithm is known. In this paper, we present a reversible jump MCMC algorithm for sampling high-probability admixture graphs and show that this approach works well both as a heuristic search for a single best-fitting graph and for summarizing shared features extracted from posterior samples of graphs. We apply the method to 11 Native American and Siberian populations and exploit the shared structure of high-probability graphs to characterize the relationship between Saqqaq, Inuit, Koryaks, and Athabascans. Our analyses show that the Saqqaq is not a good proxy for the previously identified gene flow from Arctic people into the Na-Dene speaking Athabascans.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The graphs G1, G2, G3, and G4 used for the comparisons between methods.
G1 and G2 are not based on any real dataset, but the branch lengths are chosen to have human-like values. Out was used as the outgroup for both graphs. G3 is based on M1 from Molloy et al. (2021), the graph that motivated the development of the MLNO approach of OrientAGraph. We have changed some of the branch lengths. popE was used as the outgroup. G4 is based on Model M7 from Fig 3 of Molloy et al. (2021), which is in turn based on Fig 7a from Wu (2020) [19]. The populations ITU, JPT, and ASW have been removed. The YRI population was used as the outgroup. For all graphs, as in Molloy et al. (2021), branch lengths are not shown to scale and are shown multiplied by 1000. Divergence nodes are shown as circles. Admixture nodes are shown as rectangles. The fractions inside the admixture nodes denote the contribution from the population represented by the dashed line.
Fig 2
Fig 2. We here plot the results of our method comparison with TreeMix and OrientAGraph.
For each of the graphs in Fig 1, we simulated 20 datasets and ran each method on each dataset. We compared the accuracy of each method with the 3 statistics discussed in the section Comparisons with TreeMix and OrientAGraph. For AdmixtureBayes, we examined both the Mode graph (the sampled graph with the highest posterior) and the mean value of the statistics when 100 graphs are sampled from the posterior (we refer to this as the AdmixtureBayes Mean). TreeMix and OrientAGraph allow admixture involving the outgroup, an error which AdmixtureBayes is not allowed to make. For fairness, we only plot the results for the graphs not involving admixture with the outgroup. We have listed the number of datasets that resulted in such graphs in parentheses next to the method name on the x-axes. The Topology Equality statistic for TreeMix, OrientAGraph, and the AdmixtureBayes Mode can only be 0 or 1, so we plot a horizontal line at the mean value over the datasets, rather than a true boxplot.
Fig 3
Fig 3. The two minimal topologies with the highest posterior probabilities in our real dataset.
Leaf nodes that are the product of an admixture event are shown in purple. Leaf nodes that are not the product of an admixture event are shown in light blue. The root is shown in black. Each inner node is colored according to the posterior probability that the true graph has a node with the same descendants. Higher probabilities have a darker shade of green. The posterior probability is written as a percentage in parentheses inside each node, next to the node name, which is arbitrary. The left graph has a posterior probability of 32%. The right graph has a posterior probability of 19%.
Fig 4
Fig 4. An admixture graph for the 3 populations and one outgroup.
Considering a single SNP, the quantities x1, …, x7 are changes in allele frequency, w is the admixture proportion, and P0, P1, P2 and P3 are allele frequencies in the sampled populations. Note that the edge to the outgroup (labeled with x0) is not given a direction. This is because the Gaussian drift model is reversible, meaning that the population split between the outgroup and the other populations could have happened at any point along this branch and identical allelic covariance matrices would be produced. For simplicity, we model the outgroup as the parent of the root node, as described in Method Overview.
Fig 5
Fig 5. When adding an admixture branch (green), we will randomly draw the branch where it comes from, the source branch (red).
The admixture branch goes into the sink branch (blue).

References

    1. Patterson NJ, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al.. Ancient Admixture in Human History. Genetics. 2012;192(3):1065–1093. doi: 10.1534/genetics.112.145037 - DOI - PMC - PubMed
    1. Pickrell JK, Pritchard JK. Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data. PLOS Genetics. 2012;8(11):1–17. doi: 10.1371/journal.pgen.1002967 - DOI - PMC - PubMed
    1. Molloy EK, Durvasula A, Sankararaman S. Advancing admixture graph estimation via maximum likelihood network orientation. Bioinformatics. 2021;37(Supplement_1):i142–i150. doi: 10.1093/bioinformatics/btab267 - DOI - PMC - PubMed
    1. Lipson M, Loh PR, Levin A, Reich D, Patterson N, Berger B. Efficient moment-based inference of admixture parameters and sources of gene flow. Molecular biology and evolution. 2013;30(8):1788–1802. doi: 10.1093/molbev/mst099 - DOI - PMC - PubMed
    1. Yan J, Patterson N, Narasimhan VM. miqoGraph: fitting admixture graphs using mixed-integer quadratic optimization. Bioinformatics. 2020;37(16):2488–2490. doi: 10.1093/bioinformatics/btaa988 - DOI - PubMed

Publication types