Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 May;14(4):517-35.
doi: 10.1089/cmb.2007.A010.

Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions

Affiliations
Free article
Comparative Study

Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions

Cuong Than et al. J Comput Biol. 2007 May.
Free article

Abstract

Prokaryotic organisms share genetic material across species boundaries by means of a process known as horizontal gene transfer (HGT). This process has great significance for understanding prokaryotic genome diversification and unraveling their complexities. Phylogeny-based detection of HGT is one of the most commonly used methods for this task, and is based on the fundamental fact that HGT may cause gene trees to disagree with one another, as well as with the species phylogeny. Using these methods, we can compare gene and species trees, and infer a set of HGT events to reconcile the differences among these trees. In this paper, we address three factors that confound the detection of the true HGT events, including the donors and recipients of horizontally transferred genes. First, we study experimentally the effects of error in the estimated gene trees (statistical error) on the accuracy of inferred HGT events. Our results indicate that statistical error leads to overestimation of the number of HGT events, and that HGT detection methods should be designed with unresolved gene trees in mind. Second, we demonstrate, both theoretically and empirically, that based on topological comparison alone, the number of HGT scenarios that reconcile a pair of species/gene trees may be exponential. This number may be reduced when branch lengths in both trees are estimated correctly. This set of results implies that in the absence of additional biological information, and/or a biological model of how HGT occurs, multiple HGT scenarios must be sought, and efficient strategies for how to enumerate such solutions must be developed. Third, we address the issue of lineage sorting, how it confounds HGT detection, and how to incorporate it with HGT into a single stochastic framework that distinguishes between the two events by extending population genetics theories. This result is very important, particularly when analyzing closely related organisms, where coalescent effects may not be ignored when reconciling gene trees. In addition to these three confounding factors, we consider the problem of enumerating all valid coalescent scenarios that constitute plausible species/gene tree reconciliations, and develop a polynomial-time dynamic programming algorithm for solving it. This result bears great significance on reducing the search space for heuristics that seek reconciliation scenarios. Finally, we show, empirically, that the locality of incongruence between a pair of trees has an impact on the numbers of HGT and coalescent reconciliation scenarios.

PubMed Disclaimer

Publication types

LinkOut - more resources