Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 7:9:419.
doi: 10.1186/1471-2105-9-419.

A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm

Affiliations

A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm

Sheila Podell et al. BMC Bioinformatics. .

Abstract

Background: The process of horizontal gene transfer (HGT) is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been impractical for large numbers of genomes at once, due to prohibitive computational demands. DarkHorse, a recently described statistical method for discovering phylogenetically atypical genes on a genome-wide basis, provides a means to solve this problem through lineage probability index (LPI) ranking scores. LPI scores inversely reflect phylogenetic distance between a test amino acid sequence and its closest available database matches. Proteins with low LPI scores are good horizontal gene transfer candidates; those with high scores are not.

Description: The DarkHorse algorithm has been applied to 955 microbial genome sequences, and the results organized into a web-searchable relational database, called the DarkHorse HGT Candidate Resource http://darkhorse.ucsd.edu. Users can select individual genomes or groups of genomes to screen by LPI score, search for protein functions by descriptive annotation or amino acid sequence similarity, or select proteins with unusual G+C composition in their underlying coding sequences. The search engine reports LPI scores for match partners as well as query sequences, providing the opportunity to explore whether potential HGT donor sequences are phylogenetically typical or atypical within their own genomes. This information can be used to predict whether or not sufficient information is available to build a well-supported phylogenetic tree using the potential donor sequence.

Conclusion: The DarkHorse HGT Candidate database provides a powerful, flexible set of tools for identifying phylogenetically atypical proteins, allowing researchers to explore both individual HGT events in single genomes, and large-scale HGT patterns among protein families and genome groups. Although the DarkHorse algorithm cannot, by itself, provide definitive proof of horizontal gene transfer, it is a flexible, powerful tool that can be combined with slower, more rigorous methods in situations where these other methods could not otherwise be applied.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Threshold filter determination patterns at genus level granularity for organisms whose phylogenetic relatives are represented at different abundances in Genbank nr. The circled point in each panel was chosen as the DarkHorse threshold filter value, a heuristic for calculating bitscore window sizes in that genome. Panel A, typical phylogenetic representation example, Salinispora tropica. Panel B, high representation example, Escherichia coli HS. Panel C, low representation example, Borrelia burgdorferi.
Figure 2
Figure 2
DarkHorse filter threshold values selected for 955 microbial genomes, using strain-level keywords to remove self-matches.
Figure 3
Figure 3
Screen capture of web user interface for simple search.
Figure 4
Figure 4
Screen capture of web search results page.
Figure 5
Figure 5
Screen capture of genome summary page.
Figure 6
Figure 6
LPI score frequency distribution for 955 Bacterial and Archaeal genomes, binned in 0.05 score increments, using strain level self-exclusion terms. Classification categories (kingdom, phylum, class, order, family, genus, species) indicate approximate distance of matches from the original query genome characteristic of each LPI score region. Exact classification distances may vary for microbial species containing either more or fewer taxonomic terms in their lineages.

References

    1. Ochman H, Lerat E, Daubin V. Examining bacterial species under the specter of gene transfer and exchange. Proc Natl Acad Sci USA. 2005;102:6595–6599. doi: 10.1073/pnas.0502035102. - DOI - PMC - PubMed
    1. Zaneveld JR, Nemergut DR, Knight R. Are all horizontal gene transfers created equal? Prospects for mechanism-based studies of HGT patterns. Microbiology. 2008;154:1–15. doi: 10.1099/mic.0.2007/011833-0. - DOI - PubMed
    1. Ragan MA. On surrogate methods for detecting lateral gene transfer. FEMS Microbiol Lett. 2001;201:187–191. doi: 10.1111/j.1574-6968.2001.tb10755.x. - DOI - PubMed
    1. Ragan MA, Harlow TJ, Beiko RG. Do different surrogate methods detect lateral genetic transfer events of different relative ages? Trends Microbiol. 2006;14:4–8. doi: 10.1016/j.tim.2005.11.004. - DOI - PubMed
    1. Frickey T, Lupas AN. PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res. 2004;32:5231–5238. doi: 10.1093/nar/gkh867. - DOI - PMC - PubMed

Publication types

LinkOut - more resources