Efficient implementation of a generalized pair hidden Markov model for comparative gene finding
- PMID: 15691859
- DOI: 10.1093/bioinformatics/bti297
Efficient implementation of a generalized pair hidden Markov model for comparative gene finding
Abstract
Motivation: The increased availability of genome sequences of closely related organisms has generated much interest in utilizing homology to improve the accuracy of gene prediction programs. Generalized pair hidden Markov models (GPHMMs) have been proposed as one means to address this need. However, all GPHMM implementations currently available are either closed-source or the details of their operation are not fully described in the literature, leaving a significant hurdle for others wishing to advance the state of the art in GPHMM design.
Results: We have developed an open-source GPHMM gene finder, TWAIN, which performs very well on two related Aspergillus species, A.fumigatus and A.nidulans, finding 89% of the exons and predicting 74% of the gene models exactly correctly in a test set of 147 conserved gene pairs. We describe the implementation of this GPHMM and we explicitly address the assumptions and limitations of the system. We suggest possible ways of relaxing those assumptions to improve the utility of the system without sacrificing efficiency beyond what is practical.
Availability: Available at http://www.tigr.org/software/pirate/twain/twain.html under the open-source Artistic License.
Similar articles
-
JIGSAW: integration of multiple sources of evidence for gene prediction.Bioinformatics. 2005 Sep 15;21(18):3596-603. doi: 10.1093/bioinformatics/bti609. Epub 2005 Aug 2. Bioinformatics. 2005. PMID: 16076884
-
TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.Bioinformatics. 2004 Nov 1;20(16):2878-9. doi: 10.1093/bioinformatics/bth315. Epub 2004 May 14. Bioinformatics. 2004. PMID: 15145805
-
ExonHunter: a comprehensive approach to gene finding.Bioinformatics. 2005 Jun;21 Suppl 1:i57-65. doi: 10.1093/bioinformatics/bti1040. Bioinformatics. 2005. PMID: 15961499
-
[DIGIT: a novel gene finding program of human genome].Tanpakushitsu Kakusan Koso. 2001 Dec;46(16 Suppl):2580-5. Tanpakushitsu Kakusan Koso. 2001. PMID: 11802433 Review. Japanese. No abstract available.
-
GGtools: analysis of genetics of gene expression in bioconductor.Bioinformatics. 2007 Feb 15;23(4):522-3. doi: 10.1093/bioinformatics/btl628. Epub 2006 Dec 8. Bioinformatics. 2007. PMID: 17158513 Review.
Cited by
-
Improving model construction of profile HMMs for remote homology detection through structural alignment.BMC Bioinformatics. 2007 Nov 9;8:435. doi: 10.1186/1471-2105-8-435. BMC Bioinformatics. 2007. PMID: 17999748 Free PMC article.
-
Predicting gene structure changes resulting from genetic variants via exon definition features.Bioinformatics. 2018 Nov 1;34(21):3616-3623. doi: 10.1093/bioinformatics/bty324. Bioinformatics. 2018. PMID: 29701825 Free PMC article.
-
Novel insights into the unfolded protein response using Pichia pastoris specific DNA microarrays.BMC Genomics. 2008 Aug 19;9:390. doi: 10.1186/1471-2164-9-390. BMC Genomics. 2008. PMID: 18713468 Free PMC article.
-
Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments.Genome Biol. 2008 Jan 11;9(1):R7. doi: 10.1186/gb-2008-9-1-r7. Genome Biol. 2008. PMID: 18190707 Free PMC article.
-
Approaches to Fungal Genome Annotation.Mycology. 2011 Oct 3;2(3):118-141. doi: 10.1080/21501203.2011.606851. Mycology. 2011. PMID: 22059117 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources