Evaluating bacterial gene-finding HMM structures as probabilistic logic programs
- PMID: 22215819
- PMCID: PMC3289911
- DOI: 10.1093/bioinformatics/btr698
Evaluating bacterial gene-finding HMM structures as probabilistic logic programs
Abstract
Motivation: Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog.
Results: We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable.
Availability: The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm.
Supplementary information: Supplementary data are available at Bioinformatics online.
Figures


Similar articles
-
TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.Bioinformatics. 2004 Nov 1;20(16):2878-9. doi: 10.1093/bioinformatics/bth315. Epub 2004 May 14. Bioinformatics. 2004. PMID: 15145805
-
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607. Nucleic Acids Res. 2001. PMID: 11410670 Free PMC article.
-
Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models.Bioinformatics. 1999 Dec;15(12):987-93. doi: 10.1093/bioinformatics/15.12.987. Bioinformatics. 1999. PMID: 10745988
-
Probabilistic logic methods and some applications to biology and medicine.J Comput Biol. 2012 Mar;19(3):316-36. doi: 10.1089/cmb.2011.0234. J Comput Biol. 2012. PMID: 22401592 Review.
-
Hidden Markov Models, grammars, and biology: a tutorial.J Bioinform Comput Biol. 2005 Apr;3(2):491-526. doi: 10.1142/s0219720005001077. J Bioinform Comput Biol. 2005. PMID: 15852517 Review.
Cited by
-
Prediction of Sphingosine protein-coding regions with a self adaptive spectral rotation method.PLoS One. 2019 Apr 3;14(4):e0214442. doi: 10.1371/journal.pone.0214442. eCollection 2019. PLoS One. 2019. PMID: 30943219 Free PMC article.
-
Next-generation annotation of prokaryotic genomes with EuGene-P: application to Sinorhizobium meliloti 2011.DNA Res. 2013 Aug;20(4):339-54. doi: 10.1093/dnares/dst014. Epub 2013 Apr 18. DNA Res. 2013. PMID: 23599422 Free PMC article.
-
Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrrolysine containing genes.BMC Bioinformatics. 2013 Apr 4;14:118. doi: 10.1186/1471-2105-14-118. BMC Bioinformatics. 2013. PMID: 23557142 Free PMC article.
-
BioMake: a GNU make-compatible utility for declarative workflow management.Bioinformatics. 2017 Nov 1;33(21):3502-3504. doi: 10.1093/bioinformatics/btx306. Bioinformatics. 2017. PMID: 28486579 Free PMC article.
References
-
- Blattner F.R., et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1462. - PubMed
-
- Bobbio A., et al. Acyclic discrete phase type distributions: properties and a parameter estimation algorithm. Perform. Eval. 2003;54:1–32.
-
- Borodovsky M., McInich J. GENMARK: parallel gene recognition for both DNA strands. Comput. Chem. 1993;17:123.