Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
- PMID: 16925837
- PMCID: PMC1810552
- DOI: 10.1186/gb-2006-7-s1-s3
Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
Abstract
Background: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends.
Results: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions.
Conclusion: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.
Figures





Similar articles
-
EGASP: the human ENCODE Genome Annotation Assessment Project.Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925836 Free PMC article. Review.
-
Automatic annotation of eukaryotic genes, pseudogenes and promoters.Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925832 Free PMC article.
-
GENCODE: producing a reference annotation for ENCODE.Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925838 Free PMC article.
-
AceView: a comprehensive cDNA-supported gene and transcripts annotation.Genome Biol. 2006;7 Suppl 1(Suppl 1):S12.1-14. doi: 10.1186/gb-2006-7-s1-s12. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925834 Free PMC article.
-
Current bioinformatics tools in genomic biomedical research (Review).Int J Mol Med. 2006 Jun;17(6):967-73. Int J Mol Med. 2006. PMID: 16685403 Review.
Cited by
-
Toward a gold standard for promoter prediction evaluation.Bioinformatics. 2009 Jun 15;25(12):i313-20. doi: 10.1093/bioinformatics/btp191. Bioinformatics. 2009. PMID: 19478005 Free PMC article.
-
Identifying regulatory elements in eukaryotic genomes.Brief Funct Genomic Proteomic. 2009 Jul;8(4):215-30. doi: 10.1093/bfgp/elp014. Epub 2009 Jun 4. Brief Funct Genomic Proteomic. 2009. PMID: 19498043 Free PMC article. Review.
-
Genetic variants of the HSD11B1 gene promoter may be protective against polycystic ovary syndrome.Mol Biol Rep. 2014 Sep;41(9):5961-9. doi: 10.1007/s11033-014-3473-2. Epub 2014 Jun 27. Mol Biol Rep. 2014. PMID: 24969481
-
Boosting with stumps for predicting transcription start sites.Genome Biol. 2007;8(2):R17. doi: 10.1186/gb-2007-8-2-r17. Genome Biol. 2007. PMID: 17274821 Free PMC article.
-
High DNA melting temperature predicts transcription start site location in human and mouse.Nucleic Acids Res. 2009 Dec;37(22):7360-7. doi: 10.1093/nar/gkp821. Nucleic Acids Res. 2009. PMID: 19820114 Free PMC article.
References
-
- Weinzierl ROJ. Mechanisms of Gene Expression: Structure, Function, and Evolution of the Basal Transcriptional Machinery. London: Imperial College Press; 1999.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources