Automatic identification of large collections of protein-coding or rRNA sequences
- PMID: 17920750
- DOI: 10.1016/j.biochi.2007.08.006
Automatic identification of large collections of protein-coding or rRNA sequences
Abstract
The number of available genomic sequences is growing very fast, due to the development of massive sequencing techniques. Sequence identification is needed and contributes to the assessment of gene and species evolutionary relationships. Automated bioinformatics tools are thus necessary to carry out these identification operations in an accurate and fast way. We developed HoSeqI (Homologous Sequence Identification), a software environment allowing this kind of automated sequence identification using homologous gene family databases. HoSeqI is accessible through a Web interface (http://pbil.univ-lyon1.fr/software/HoSeqI/) allowing to identify one or several sequences and to visualize resulting alignments and phylogenetic trees. We also implemented another application, MultiHoSeqI, to quickly add a large set of sequences to a family database in order to identify them, to update the database, or to help automatic genome annotation. Lately, we developed an application, ChiSeqI (Chimeric Sequence Identification), to automate the processes of identification of bacterial 16S ribosomal RNA sequences and of detection of chimeric sequences.
Similar articles
-
Polymorphix: a sequence polymorphism database.Nucleic Acids Res. 2005 Jan 1;33(Database issue):D481-4. doi: 10.1093/nar/gki076. Nucleic Acids Res. 2005. PMID: 15608242 Free PMC article.
-
HoSeqI: automated homologous sequence identification in gene family databases.Bioinformatics. 2006 Jul 15;22(14):1786-7. doi: 10.1093/bioinformatics/btl179. Epub 2006 May 8. Bioinformatics. 2006. PMID: 16682422
-
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1. Bioinformatics. 2007. PMID: 17332025
-
16S rRNA gene sequencing for bacterial pathogen identification in the clinical laboratory.Mol Diagn. 2001 Dec;6(4):313-21. doi: 10.1054/modi.2001.29158. Mol Diagn. 2001. PMID: 11774196 Review.
-
Then and now: use of 16S rDNA gene sequencing for bacterial identification and discovery of novel bacteria in clinical microbiology laboratories.Clin Microbiol Infect. 2008 Oct;14(10):908-34. doi: 10.1111/j.1469-0691.2008.02070.x. Clin Microbiol Infect. 2008. PMID: 18828852 Review.
Cited by
-
CREST--classification resources for environmental sequence tags.PLoS One. 2012;7(11):e49334. doi: 10.1371/journal.pone.0049334. Epub 2012 Nov 8. PLoS One. 2012. PMID: 23145153 Free PMC article.
-
RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences.BMC Bioinformatics. 2009 Jun 16;10 Suppl 6(Suppl 6):S5. doi: 10.1186/1471-2105-10-S6-S5. BMC Bioinformatics. 2009. PMID: 19534754 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources