Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Jan;26(1):47-58.
doi: 10.1038/s41576-024-00772-4. Epub 2024 Sep 30.

Inference and applications of ancestral recombination graphs

Affiliations
Review

Inference and applications of ancestral recombination graphs

Rasmus Nielsen et al. Nat Rev Genet. 2025 Jan.

Abstract

Ancestral recombination graphs (ARGs) summarize the complex genealogical relationships between individuals represented in a sample of DNA sequences. Their use is currently revolutionizing the field of population genetics and is leading to the development of powerful new methods to elucidate individual and population genetic processes, including population size history, migration, admixture, recombination, mutation and selection. In this Review, we introduce the readers to the structure of ARGs and discuss how they relate to processes such as recombination and genetic drift. We explore differences and similarities between methods of estimating ARGs and provide concrete illustrative examples of how ARGs can be used to elucidate population-level processes.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
(A-C) Examples of coalescence trees. The trees consist of a set of nodes connected by edges. Synonymous words for edges used in biology are lineages or branches. The nodes representing the observed DNA sequences are the leaves (leaf nodes) in the tree, and the nodes representing coalescence events are internal nodes. The node at the top of the tree represents the Most Recent Common Ancestor (MRCA) of the three sequences. Only the internal nodes, and not the leaf nodes, are numbered and represented by circles in this depiction. Also, notice that the trees are depicted with the root at the top and the leaves at the bottom, as is the tradition in computer science (computer scientists do not spend much time in natural forests). All three coalescence trees (A, B, and C) are embedded in the ARG of (D). While the topologies are identical, the trees differ by having different branch lengths. (D) An example ARG. There are three sequences (bottom of figure). White circles represent coalescence events and black circles represent recombination events. The two recombination events split the sequences into three segments labeled with green, blue, and red colors.
Fig. 2
Fig. 2
Examples of coalescence events permitted by different models of the coalescent process with recombination. The left panel illustrates a coalescence event between two lineages carrying non-overlapping and non-adjacent segments of DNA. Such coalescence events are allowed only under the full coalescent with recombination (CwR) and not by SMC′ or SMC. The middle panel illustrates a coalescence event between two lineages carrying adjacent but non-overlapping segments of DNA. Such coalescence events are allowed under both the CwR and SMC′ processes but not under SMC. The right panel illustrates a coalescence event between overlapping segments of DNA. Such coalescence events are allowed under all three models. The limits that each of the SMC′ and SMC models place on the types of allowed coalescences restrict the state space of ARGs that can be simulated under these models, with SMC inducing a more severe restriction than SMC′.
Fig. 3
Fig. 3
Performance of SINGER, ARGweaver, ARG-Needle, Relate and tsinfer+tsdate, evaluated with different metrics. These metrics include pairwise TMRCA (A), allele ages (B), triplet and quartet error rate (C). Methods were run on 50 and 300 simulated sequences. For allele ages and pairwise TMRCA, the mean squared error (MSE) and Spearman’s correlation were calculated between the true values from the simulations and the inferred values. For topology benchmarking, we used the proportion of triplet or quartet topologies wrongly inferred (topology error rate) to measure performance.
Fig. 4
Fig. 4
Examples of how information about different population genetic parameters are reflected in the ARG. (A) A smaller effective population size during a certain time period contributes to a higher density of coalescences across the whole ARG during that period. Note that here, as in Figure 5, we use squares to represent recombination events and circles for coalescence events. (B) Natural selection at a SNP results in a more rapid increase in the frequency of the selected allele, resulting in a higher density of coalescences of lineages carrying the selected allele (shown in blue) at the tree spanning the SNP. We highlight the portion of the genome this tree spans to emphasize that we are not showing the whole ARG.
Fig. 5
Fig. 5
A schematic of how to utilize importance sampling with ARG inference methods to estimate the likelihood function of population genetic parameters. In Step 1, graphs are sampled from a certain posterior. As in Figure 4, mutations are shown as colored diamonds on the ARG, squares represent recombination events, and circles represent coalescence events. In Step 2, the importance sampling weight of each graph is computed. In Step 3, we compute the likelihood of a parameter as the weighted average of the likelihood of each sampled graph. Here θ represents any population genetic parameter of interest, such as population size or selection coefficient.
Fig. 6
Fig. 6
Using inferred ARGs to learn about the balancing selection at ABO and the selective sweep at MCM6. (A) The purple lines show the inferred TMRCA for 50 sequences from the CEU population from the 1000 Genomes Project, and the human-chimpanzee speciation time (around 6 Mya) is shown as a dashed line. The ABO gene (shaded area) exhibits coalescence times older than the speciation time. (B) The ARG-inferred 1 kb branch-length-based diversity among all samples (red) versus that of carriers of the derived allele of rs4988235 (blue) in the GBR population from the 1000 Genomes Project.

References

    1. Kingman JFC: On the Genealogy of Large Populations. Technical report (1982). https://www.jstor.org/stable/3213548
    1. Hudson RR: Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology 23(2), 183–201 (1983) - PubMed
    1. Fu YX, Li WH: Coalescing into the 21st century: An overview and prospects of coalescent theory. Theor. Popul. Biol 56(1), 1–10 (1999) - PubMed
    1. Rosenberg NA, Nordborg M: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat. Rev. Genet 3(5), 380–390 (2002) - PubMed
    1. Wakeley J.: Developments in coalescent theory from single loci to chromosomes. Theor. Popul. Biol 133, 56–64 (2020) - PubMed

LinkOut - more resources