Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May;78(5):279-92.
doi: 10.1007/s00239-014-9620-5. Epub 2014 May 11.

Bayesian inference of local trees along chromosomes by the sequential Markov coalescent

Affiliations

Bayesian inference of local trees along chromosomes by the sequential Markov coalescent

Chaozhi Zheng et al. J Mol Evol. 2014 May.

Abstract

We propose a genealogy-sampling algorithm, Sequential Markov Ancestral Recombination Tree (SMARTree), that provides an approach to estimation from SNP haplotype data of the patterns of coancestry across a genome segment among a set of homologous chromosomes. To enable analysis across longer segments of genome, the sequence of coalescent trees is modeled via the modified sequential Markov coalescent (Marjoram and Wall, Genetics 7:16, 2006). To assess performance in estimating these local trees, our SMARTree implementation is tested on simulated data. Our base data set is of the SNPs in 10 DNA sequences over 50 kb. We examine the effects of longer sequences and of more sequences, and of a recombination and/or mutational hotspot. The model underlying SMARTree is an approximation to the full recombinant-coalescent distribution. However, in a small trial on simulated data, recovery of local trees was similar to that of LAMARC (Kuhner et al. Genetics 156:1393-1401, 2000a), a sampler which uses the full model.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The genealogy of three chromosomes of length 8 bp. The upper panel shows the ARG representation and the lower panel shows the derived local trees, with a schematic showing the transition between local trees T3 and T4. In the upper panel, black boxes denote sampled chromosomes or material ancestral to them, and gray boxes denote non-ancestral material. The pair (R, x) indicates a recombination event of type R occurring between sites x − 1 and x. The lower panel shows the derived local trees T(x) and the transition from T3 to T4. The black dot indicates the lineage location of recombination event (R1, 3), which is chosen uniformly on tree T3. The emerging lineage (dotted line) coalesces at a rate based on the number of lineages yielding a new coalescence time tc at node D in tree T4, changing the tree topology from {{1, 2}, 3} into {1, {2, 3}}.
Fig. 2
Fig. 2
The propagation move. Letters A, B, C and D indicate local trees in the ARG, and superscript * indicates a new (changed) local tree. The identities of Tp, T*, Tx and Ty are indicated for each type of propagation move.
Fig. 3
Fig. 3
Local tree recovery in SMARTree (black) and LAMARC (gray). (a) Inferred numbers of total transitions (heavy lines) and invisible transitions (fine lines). The true ARG had 64 total transitions of which 45 were invisible. (b) Inferred visible transition locations. The dashed line indicates the uniform prior, and the vertical bars at the top show the true visible transition locations. (c, d) Total branch lengths of the inferred local trees for LAMARC (c) and SMARTree (d). Black dots show posterior medians, the stepped black line indicates the true local tree length, vertical lines show 95% support intervals, and gray dashed lines show quantiles 0.025, 0.5, and 0.975 of the random distribution of total branch length under the Kingman coalescent. (e, f) Topological similarity between inferred and true local trees for LAMARC (e) and SMARTree (f). Symbols as in parts (c, d) with the quantiles representing the distribution of RF distances between the true local tree and trees randomly drawn from the Kingman coalescent. (This distribution varies across the sequence as some true local trees are more typical of the coalescent than others. The 0.5 and 0.975 quantiles lie on top of each other at 16 as random genealogies are generally at maximum RF distance.)
Fig. 4
Fig. 4
Comparison of models M1 (both SNPs and non-SNPs used) and M2 (only SNPs used). In panels ac black lines indicate model M1 and gray lines indicate model M2; the black dot shows the true simulation value. Panel a shows inference of θ, panel b shows inference of ε, and panel c shows inference of ρ. Panels d (model M1) and e (model M2) show RF distances between inferred and true local trees, with black dots indicating the posterior median, vertical bars indicating 95% support intervals, and dashed lines showing quantiles as in Figure 3.
Fig. 5
Fig. 5
Inference from data containing a hotspot. Data set Rec/Mut-Hotspot is shown in thick gray, data set Rec-Hotspot in thin gray, and the non-hotspot data set Standard in black. Panels a and b show inference of θ and ρ, respectively; the horizontal dashed line indicates the uniform prior and black dots indicate the simulation values. Panel c shows inferred total length of the local tree; the thin black line indicates true local tree length, and the dashed line indicates the expectation based on the Kingman prior. Panel d shows inferred transition locations; black vertical bars above the graph indicate the true transition locations, and the dashed line indicates the Kingman prior. Panels e and f show RF distances between inferred and true local trees; black dots indicate posterior medians, vertical lines show 95% support intervals, and gray dashed lines indicate quantiles of the expected difference between the true local tree and the expected distribution under the Kingman prior (the 0.5 and 0.975 quantiles lie on top of each other at the maximum). Panel e shows Rec-Hotspot while Panel f shows Rec/Mut-Hotspot.
Fig. 6
Fig. 6
Effect of amount of data on SMARTree inference. Standard thick black lines, Extra-Seqs thin gray lines, Extra-BP thick gray lines. Panel a shows median RF distance between inferred and true tree for data sets Standard and Extra-Seqs, and panel b shows the same comparison for Standard and the first 50 bp of Extra-BP. Panel c shows median total branch length for the first 50 kb, with the stepped black line showing the true value and the dashed line indicating the uniform prior. Panel d compares median total branch length across the entire 500 kb between Extra-BP (thick gray line) and the truth (thin black line); the horizontal dashed line indicates the expectation under the Kingman prior.

Similar articles

Cited by

References

    1. Brown MD, Glazner CG, Zheng C, Thompson EA. Inferring coancestry in population samples in the presence of linkage disequilibrium. Genetics. 2012;190:1447–1460. - PMC - PubMed
    1. Browning SR, Browning B. High-resolution detection of identity by descent in unrelated individuals. Am J Hum Genet. 2010;86:526–539. - PMC - PubMed
    1. Fearnhead P, Donnelly P. Estimating recombination rates from population genetic data. Genetics. 2001;159:1299–1318. - PMC - PubMed
    1. Felsenstein J. Evolutionary trees from DNA-sequences - a maximum-likelihood approach. Journal of Molecular Evolution. 1981;17:368–376. - PubMed
    1. Felsenstein J. Inferring Phylogenies. Sinauer Associates; Sunderland, MA: 2004.

Publication types

LinkOut - more resources