Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;8(4):e1002660.
doi: 10.1371/journal.pgen.1002660. Epub 2012 Apr 19.

The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection

Affiliations

The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection

Yun Yu et al. PLoS Genet. 2012.

Abstract

Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Phylogenetic networks, MUL trees, and valid allele mappings.
In this example, single alleles formula image, formula image, and formula image were sampled from each of the three species formula image, formula image, and formula image, respectively, whereas two alleles (formula image and formula image) were sampled from species formula image. See text and Text S1 for details.
Figure 2
Figure 2. Various hypotheses for the evolutionary history of a yeast data set.
(A) The species tree for the five species Sbay, Skud, Smik, Scer, and Spar, as proposed in , and inferred using a Bayesian approach and a parsimony approach . (B) A slightly suboptimal tree for the five species, as identified in , . (C–E) The three phylogenetic networks that reconcile both trees in (A) and (B), and which we reported as equally optimal evolutionary histories under a parsimony criterion in . (F) A phylogenetic network that postulates Smik and Skud as two sister taxa whose divergence followed a hybridization event.
Figure 3
Figure 3. Six hypotheses for the evolutionary history of a Drosophila data set.
(A–C) The three possible species tree topologies. (D–E) The three possible single-hybridization species network topologies (excluding extinction events).
Figure 4
Figure 4. Identifiability in detecting hybridization.
(A) A phylogenetic network with two hybridization probabilities, where the second hybridization involves the first hybrid population, and extinction is involved. (B–D) Estimates of formula image and formula image, as a function of the number of gene trees used, when the true values of formula image are assumed in the inference, and for true formula image values of formula image, formula image, and formula image, respectively (insets zoom in on the left parts of the figure). (E) A phylogenetic tree with three taxa, and with divergence time formula image between the two speciation events. (F–H) The value of formula image for the tree in (E) that yields the same probability of the data under the scenario depicted in (A) when formula image, as a function of formula image and formula image, and for formula image value of formula image, formula image, and formula image, respectively. Since a single allele was sampled per species, the data is uninformative for estimating the value of formula image here.

References

    1. Doyle JJ. Gene trees and species trees: molecular systematics as one-character taxonomy. Syst Bot. 1992;17:144–163.
    1. Maddison W. Gene trees in species trees. Syst Biol. 1997;46:523–536.
    1. Edwards SV. Is a new and general theory of molecular systematic biology emerging? Evolution. 2009;63:1–19. - PubMed
    1. Swofford D, Olsen G, Waddell P, Hillis D. Phylogenetic inference. In: Hillis D, Mable B, Moritz C, editors. Molecular Syst Biol.s. Sunderland, Mass.: Sinauer Assoc; 1996. pp. 407–514.
    1. Rosenberg NA. The probability of topological concordance of gene trees and species trees. Theor Pop Biol. 2002;61:225–247. - PubMed

Publication types

LinkOut - more resources