Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Feb 11:6:15.
doi: 10.1186/1471-2148-6-15.

Phylogenetic identification of lateral genetic transfer events

Affiliations

Phylogenetic identification of lateral genetic transfer events

Robert G Beiko et al. BMC Evol Biol. .

Abstract

Background: Lateral genetic transfer can lead to disagreements among phylogenetic trees comprising sequences from the same set of taxa. Where topological discordance is thought to have arisen through genetic transfer events, tree comparisons can be used to identify the lineages that may have shared genetic information. An 'edit path' of one or more transfer events can be represented with a series of subtree prune and regraft (SPR) operations, but finding the optimal such set of operations is NP-hard for comparisons between rooted trees, and may be so for unrooted trees as well.

Results: Efficient Evaluation of Edit Paths (EEEP) is a new tree comparison algorithm that uses evolutionarily reasonable constraints to identify and eliminate many unproductive search avenues, reducing the time required to solve many edit path problems. The performance of EEEP compares favourably to that of other algorithms when applied to strictly bifurcating trees with specified numbers of SPR operations. We also used EEEP to recover edit paths from over 19,000 unrooted, incompletely resolved protein trees containing up to 144 taxa as part of a large phylogenomic study. While inferred protein trees were far more similar to a reference supertree than random trees were to each other, the phylogenetic distance spanned by random versus inferred transfer events was similar, suggesting that real transfer events occur most frequently between closely related organisms, but can span large phylogenetic distances as well. While most of the protein trees examined here were very similar to the reference supertree, requiring zero or one edit operations for reconciliation, some trees implied up to 40 transfer events within a single orthologous set of proteins.

Conclusion: Since sequence trees typically have no implied root and may contain unresolved or multifurcating nodes, the strategy implemented in EEEP is the most appropriate for phylogenomic analyses. The high degree of consistency among inferred protein trees shows that vertical inheritance is the dominant pattern of evolution, at least for the set of organisms considered here. However, the edit paths inferred using EEEP suggest an important role for genetic transfer in the evolution of microbial genomes as well.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SPR operation on a rooted phylogenetic tree. The tree in panel (a) is subjected to an SPR operation, with participants and direction indicated with the dashed arrow. Edge E4 is the donor edge, which is split by acquisition of the recipient edge E3. Since they are no longer split by E3, edges E0 and E2 are consolidated into a single edge, which implies the same split of taxa as E2. Splitting E4 yields a new parent edge E(3+4), and a child edge that implies the same split as E4. Finally, the bipartitions implied by every other edge up to the common ancestor of the donor and recipient edges (in this case, the root R) are modified by the swapping of subtree t from one partition to the other. Thus, in this case E1 now implies a different set of taxa and is renamed E(1+3). Edges that are not part of the donor/recipient pair or ancestral to these edges are not affected by the inferred transfer event.
Figure 2
Figure 2
Phylogenetic tree reconciliations proposed by LatTrans, HorizStory, and EEEP. Different sets of edit operations, indicated by arrows marked A through E, are proposed to reconcile the reference tree (a) with either a rooted (b) or an unrooted (c) test tree. As described in the main text, LatTrans proposes edits A and B, while HorizStory proposes these two as well as the phantom sister edit C. EEEP with time constraints will propose edits A, B, D, and E, while removing the time constraint allows donation of genetic material from ancestor to descendant, which is analogous to edit C.
Figure 3
Figure 3
Effect of a misplaced root on inferred edit paths. Trees (a) and (b) are identical if the rooting of one or the other is ignored, as would be the case in EEEP, and no edit path reconciliation is necessary. However, LatTrans and HorizStory, which both require a rooted test tree, would be adversely affected by misplacement of the root, since four edits are required to reconcile trees (a) and (b) if the rooting is preserved.
Figure 4
Figure 4
Successful recovery of edit paths by LatTrans, HorizStory, and EEEP. The percentage of edit paths recovered from random trees of several different sizes are shown for LatTrans (filled circles) and HorizStory (filled diamonds), and for several different types of EEEP run ('standard' run with time constraints, partitioning and no ratchet – filled triangles; strict test tree ratchet – filled squares; permissive test tree ratchet – open squares; time unconstrained runs – open triangles with long dashed line; unpartitioned runs – open triangles with short dashed line). Open triangles connected by a solid line indicate the percentage of cases where at least one EEEP run was able to recover an edit path. Reference tree ratchets are not shown because the difference in edit path recovery between reference tree and test tree ratchets was never greater than 5%. The runs summarized in this figure were all limited to a maximum of 4 gigabytes of RAM and 5 hours of running time (see manuscript for details).
Figure 5
Figure 5
Recovery of edit paths from inferred protein trees, grouped by size. Five different EEEP settings were used to recover edit paths from 19 672 inferred protein trees with at least one resolved bipartition, via comparison with the inferred MRP supertree. Five types of bar, ordered from left to right, represent 'standard' EEEP runs with no ratchet, a permissive test tree ratchet, a strict test tree ratchet, a permissive reference tree ratchet, and a strict reference tree ratchet. The final, checkered bar for each category represents the total number of cases where at least one type of EEEP run recovered an edit path solution. The total number of protein trees in each size class (e.g., 11–20) is indicated by horizontal lines with a centered open diamond.
Figure 6
Figure 6
Theoretical maximum, and observed mean edit path length for random and inferred protein trees of different sizes. Filled circles show the maximum possible most-parsimonious edit path length for trees with n taxa (= n - 3). Filled diamonds indicate the mean edit distance recovered from comparisons between random pairs of trees with up to 15 taxa, with each tree size replicated 500 times. Open diamonds show the mean edit distance recovered for protein trees of size 4 to 100, with the linear best-fit relationship for the points in this range shown (y = 0.080x + 0.108, R2 = 0.656).
Figure 7
Figure 7
Mean normalized reference tree distance for inferred protein trees and random trees with different numbers of taxa and edits. Each pair of vertical bars indicates the mean ± standard deviation of the mean normalized reference tree distance (defined in the text) for 10 pairs of random trees (gray bars) and for >5 protein tree/MRP supertree pairs (white bars). Pairs of numbers on the x-axis indicate combinations of tree size – number of inferred edits.
Figure 8
Figure 8
Trails of destruction for two pairs of six-taxon trees. Two pairs of reference and test trees are shown: in both cases, the test tree is the consequence of two edit operations on the corresponding reference tree. The edits are indicated with boldface arrows, with the arrowhead pointing to the recipient lineage. The trails of destruction are composed of the edges in each reference or test tree which imply bipartitionings of taxa that are incompatible with the other tree. These incompatible edges are drawn with dashed lines in the figure, and the endpoints indicated with filled circles. In Figure 8a, there are four endpoints on the trail of destruction, thus identifying the minimum number of edit operations as 2. The trail in Figure 8b has only two endpoints, so in this case the minimal edit distance of 1 suggested by the endpoints is not the correct number of edits.
Figure 9
Figure 9
Ratchets based on reference tree distance. Comparison of a rooted reference tree (a) with an unrooted test tree (b) shows that five internal edges in the reference tree imply bipartitionings of taxa that are not consistent with the test tree (note that since they imply the exact same bipartition, the two edges connected to the root of the reference tree only contribute a single count), corresponding to a reference tree distance of 5. Two SPR operations that could be proposed by EEEP are indicated with dashed arrows: the resulting modified reference trees are indicated for arrow number 1 in panel (c), and for arrow number 2 in panel (d). These two trees have reference tree distances of 4 and 3 respectively from the test tree, and are therefore not treated equally if a ratchet is being used. Under a permissive ratchet, both edits would be accepted and used for subsequent SPR moves, because both yielded a decrease in the overall reference tree distance. However, under a strict ratchet, only the edit that yielded a reference tree distance of 3 would be accepted, because it yielded the largest decrease in reference tree distance.

References

    1. Doolittle WF. Phylogenetic classification and the universal tree. Science. 1999;284:2124–2129. doi: 10.1126/science.284.5423.2124. - DOI - PubMed
    1. Woese CR, Olsen GJ, Ibba M, Soll D. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol Mol Biol Rev. 2000;64:202–236. doi: 10.1128/MMBR.64.1.202-236.2000. - DOI - PMC - PubMed
    1. Kurland CG. Something for everyone. Horizontal gene transfer in evolution. EMBO Rep. 2000;1:92–95. doi: 10.1093/embo-reports/kvd042. - DOI - PMC - PubMed
    1. Gogarten JP, Doolittle WF, Lawrence JG. Prokaryotic evolution in light of gene transfer. Mol Biol Evol. 2002;19:2226–2238. - PubMed
    1. Kurland CG, Canback B, Berg OG. Horizontal gene transfer: a critical view. Proc Natl Acad Sci U S A. 2003;100:9658–9662. doi: 10.1073/pnas.1632870100. - DOI - PMC - PubMed

Publication types

LinkOut - more resources