Hal: an automated pipeline for phylogenetic analyses of genomic data
- PMID: 21327165
- PMCID: PMC3038436
- DOI: 10.1371/currents.RRN1213
Hal: an automated pipeline for phylogenetic analyses of genomic data
Abstract
The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed.
References
-
- van Dongen, S. 2000. Graph Clustering by Flow Simulation. University of Utrecht, Utrecht, Netherlands.
-
- Alexeyenko A., Tamas I., Liu G., Sonnhammer, E.L.L. 2006. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9-e15. - PubMed
-
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410. - PubMed
-
- Storm C.E.V., Sonnhammer E.L.L. 2002. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics 18:92–99. - PubMed
LinkOut - more resources
Full Text Sources
