Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants
- PMID: 16306388
- DOI: 10.1093/bioinformatics/bti1205
Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants
Abstract
Motivation: The vast majority of introns in protein-coding genes of higher eukaryotes have a GT dinucleotide at their 5'-terminus and an AG dinucleotide at their 3' end. About 1-2% of introns are non-canonical, with the most abundant subtype of non-canonical introns being characterized by GC and AG dinucleotides at their 5'- and 3'-termini, respectively. Most current gene prediction software, whether based on ab initio or spliced alignment approaches, does not include explicit models for non-canonical introns or may exclude their prediction altogether. With present amounts of genome and transcript data, it is now possible to apply statistical methodology to non-canonical splice site prediction. We pursued one such approach and describe the training and implementation of GC-donor splice site models for Arabidopsis and rice, with the goal of exploring whether specific modeling of non-canonical introns can enhance gene structure prediction accuracy.
Results: Our results indicate that the incorporation of non-canonical splice site models yields dramatic improvements in annotating genes containing GC-AG and AT-AC non-canonical introns. Comparison of models shows differences between monocot and dicot species, but also suggests GC intron-specific biases independent of taxonomic clade. We also present evidence that GC-AG introns occur preferentially in genes with atypically high exon counts.
Availability: Source code for the updated versions of GeneSeqer and SplicePredictor (distributed with the GeneSeqer code) isavailable at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis, rice and other plant species are accessible at http://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/AtGDBgs.cgi, http://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/OsGDBgs.cgi and http://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/PlantGDBgs.cgi, respectively. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi. Software to generate training data and parameterizations for Bayesian splice site models is available at http://gremlin1.gdcb.iastate.edu/~volker/SB05B/BSSM4GSQ/
Similar articles
-
Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus.Bioinformatics. 2004 May 1;20(7):1157-69. doi: 10.1093/bioinformatics/bth058. Epub 2004 Feb 5. Bioinformatics. 2004. PMID: 14764557
-
Common introns within orthologous genes: software and application to plants.Brief Bioinform. 2009 Nov;10(6):631-44. doi: 10.1093/bib/bbp051. Brief Bioinform. 2009. PMID: 19933210
-
GeneSeqer@PlantGDB: Gene structure prediction in plant genomes.Nucleic Acids Res. 2003 Jul 1;31(13):3597-600. doi: 10.1093/nar/gkg533. Nucleic Acids Res. 2003. PMID: 12824374 Free PMC article.
-
Advances in the Exon-Intron Database (EID).Brief Bioinform. 2006 Jun;7(2):178-85. doi: 10.1093/bib/bbl003. Epub 2006 Mar 9. Brief Bioinform. 2006. PMID: 16772261 Review.
-
Genome-wide analyses of alternative splicing in plants: opportunities and challenges.Genome Res. 2008 Sep;18(9):1381-92. doi: 10.1101/gr.053678.106. Epub 2008 Jul 30. Genome Res. 2008. PMID: 18669480 Review.
Cited by
-
MetWAMer: eukaryotic translation initiation site prediction.BMC Bioinformatics. 2008 Sep 18;9:381. doi: 10.1186/1471-2105-9-381. BMC Bioinformatics. 2008. PMID: 18801175 Free PMC article.
-
Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence.BMC Res Notes. 2017 Dec 4;10(1):667. doi: 10.1186/s13104-017-2985-y. BMC Res Notes. 2017. PMID: 29202864 Free PMC article.
-
Comprehensive splice-site analysis using comparative genomics.Nucleic Acids Res. 2006;34(14):3955-67. doi: 10.1093/nar/gkl556. Epub 2006 Aug 12. Nucleic Acids Res. 2006. PMID: 16914448 Free PMC article.
-
CMT3 and SUVH4/KYP silence the exonic Evelknievel retroelement to allow for reconstitution of CMT1 mRNA.Epigenetics Chromatin. 2018 Nov 16;11(1):69. doi: 10.1186/s13072-018-0240-y. Epigenetics Chromatin. 2018. PMID: 30446008 Free PMC article.
-
Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes.BMC Genomics. 2018 Dec 29;19(1):980. doi: 10.1186/s12864-018-5360-z. BMC Genomics. 2018. PMID: 30594132 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Miscellaneous