Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 15:11:324.
doi: 10.1186/1471-2105-11-324.

Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests

Affiliations

Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests

Sophie S Abby et al. BMC Bioinformatics. .

Abstract

Background: To understand the evolutionary role of Lateral Gene Transfer (LGT), accurate methods are needed to identify transferred genes and infer their timing of acquisition. Phylogenetic methods are particularly promising for this purpose, but the reconciliation of a gene tree with a reference (species) tree is computationally hard. In addition, the application of these methods to real data raises the problem of sorting out real and artifactual phylogenetic conflict.

Results: We present Prunier, a new method for phylogenetic detection of LGT based on the search for a maximum statistical agreement forest (MSAF) between a gene tree and a reference tree. The program is flexible as it can use any definition of "agreement" among trees. We evaluate the performance of Prunier and two other programs (EEEP and RIATA-HGT) for their ability to detect transferred genes in realistic simulations where gene trees are reconstructed from sequences. Prunier proposes a single scenario that compares to the other methods in terms of sensitivity, but shows higher specificity. We show that LGT scenarios carry a strong signal about the position of the root of the species tree and could be used to identify the direction of evolutionary time on the species tree. We use Prunier on a biological dataset of 23 universal proteins and discuss their suitability for inferring the tree of life.

Conclusions: The ability of Prunier to take into account branch support in the process of reconciliation allows a gain in complexity, in comparison to EEEP, and in accuracy in comparison to RIATA-HGT. Prunier's greedy algorithm proposes a single scenario of LGT for a gene family, but its quality always compares to the best solutions provided by the other algorithms. When the root position is uncertain in the species tree, Prunier is able to infer a scenario per root at a limited additional computational cost and can easily run on large datasets.Prunier is implemented in C++, using the Bio++ library and the phylogeny program Treefinder. It is available at: http://pbil.univ-lyon1.fr/software/prunier.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Prunier algorithm: example of a Prunier run. Example of reconciliation of an unrooted gene tree TG with a rooted species tree TS by searching for the maximum statistical agreement forest (MSAF). In the gene tree, blue branches are those common to the two trees whereas red branches indicate conflicting edges, i.e., those found in TG but not in TS. Support values are shown for conflicting edges. In this example, two rounds are needed to reconcile the two trees. The agreement function "Agree" corresponds here to the "fast" version of Prunier: a gene (sub-)tree is considered in statistical agreement with the species (sub-)tree if no conflicting edge above 80 exists. At each round, clades corresponding to common edges (blue branches) are ranked by decreasing scores. This score reflects the conflict (combination of the support values) removed when the edge is cut (symbolized by "Rm{clade}"). The highest scoring subtree candidate to transfer is removed if it is in agreement with the species tree. The output of Prunier is a statistical agreement forest (SAF), composed of a reconciled backbone subtree (non-transferred sequences: {ABFGHIK}) and as many reconciled subtrees (transferred sequences) as lateral gene transfers (LGT). In this example, two LGT events are inferred: {J} and {CDE} have been transferred.
Figure 2
Figure 2
Accuracy of transfer events detection. A: numbers of true positive LGTs (TP). B: numbers of false positive LGTs (FP). LGT events detected in maximum-likelihood gene trees are displayed as a function of the real number of transfers per tree. Results were computed with a support threshold for topological conflict significance of 0.90. As EEEP and RIATA-HGT may propose multiple scenarios, grey areas represent the variability between the worst and the best scenarios. Areas are delimited by blue borders for RIATA-HGT and green borders for EEEP. Prunier (fast version) results are drawn in magenta. The black curves symbolize the reference boundary: those curves were obtained using RIATA-HGT (best scenario selected) on true gene trees (before sequence simulation). The green numbers below the true positives area of EEEP represent the number of unresolved cases (either quoted as "Unsolved" by the program, or resulting from a program crash). The blue numbers above RIATA-HGT TP area indicate the two cases for which the program crashed. A program crash is counted as 0 TP and 0 FP. The grey line shows a relationship 1:1 between the two axes. The estimated number of transfers by each method is the sum of the curves shown in A and B.
Figure 3
Figure 3
Inference of correct complete LGT scenarios. Proportion of correctly inferred LGT scenarios, i.e. scenarios with 100% TP and 0% FP as a function of the true number of LGT. Results were obtained with a support threshold for topological conflict significance of 0.90. As EEEP and RIATA-HGT may propose multiple scenarios, grey areas represent the variability between the worst and the best scenarios. Areas are delimited by blue borders for RIATA-HGT and green borders for EEEP. Prunier (fast version) results are plotted in magenta. The black curve symbolizes the reference boundary was obtained using RIATA-HGT (best scenario selected) on true gene trees.
Figure 4
Figure 4
Accuracy of transferred sequences detection. RIATA-HGT (best scenarios selected) and Prunier (fast version) results are respectively indicated by triangles and circles. The color code corresponds to the different support values tested as a threshold for topological conflict significance. The figure represents the positive predictive value as a function of the negative predictive value (see Material and Methods) of the two methods. Black lines represent the limits given by RIATA-HGT on true trees. These statistics show how often predictions are correct when sequences are classified as transferred (PPV) or not transferred (NPV). A perfect method would have PPV = 1 and NPV = 1.

Similar articles

Cited by

References

    1. Boussau B, Daubin V. Genomes as documents of evolutionary history. Trends Ecol Evol. 2010;25:224–232. doi: 10.1016/j.tree.2009.09.007. - DOI - PubMed
    1. Doolittle WF. Phylogenetic classification and the universal tree. Science. 1999;284:2124–2129. doi: 10.1126/science.284.5423.2124. - DOI - PubMed
    1. Gogarten JP, Doolittle WF, Lawrence JG. Prokaryotic evolution in light of gene transfer. Mol Biol Evol. 2002;19:2226–2238. - PubMed
    1. Philippe H, Lopez P, Brinkmann H, Budin K, Germot A, Laurent J, Moreira D, Müller M, Le Guyader H. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc Biol Sci. 2000;267:1213–1221. doi: 10.1098/rspb.2000.1130. - DOI - PMC - PubMed
    1. Brinkmann H, van der Giezen M, Zhou Y, Poncelin de Raucourt Poncelin G, Philippe H. An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol. 2005;54:743–757. doi: 10.1080/10635150500234609. - DOI - PubMed

Publication types