SpliceDB: database of canonical and non-canonical mammalian splice sites
- PMID: 11125105
- PMCID: PMC29840
- DOI: 10.1093/nar/29.1.255
SpliceDB: database of canonical and non-canonical mammalian splice sites
Abstract
A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT-AG junctions (22 199 entries) and 0.56% have non-canonical GC-AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors corrected to AT-AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac. uk/spldb/SpliceDB.html and at http://www.softberry. com/spldb/SpliceDB.html.
Figures
Similar articles
-
Analysis of canonical and non-canonical splice sites in mammalian genomes.Nucleic Acids Res. 2000 Nov 1;28(21):4364-75. doi: 10.1093/nar/28.21.4364. Nucleic Acids Res. 2000. PMID: 11058137 Free PMC article.
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants.Bioinformatics. 2005 Nov 1;21 Suppl 3:iii20-30. doi: 10.1093/bioinformatics/bti1205. Bioinformatics. 2005. PMID: 16306388
-
Exonization of transposed elements: A challenge and opportunity for evolution.Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26. Biochimie. 2011. PMID: 21787833 Review.
-
Rules and tools to predict the splicing effects of exonic and intronic mutations.Wiley Interdiscip Rev RNA. 2018 Jan;9(1). doi: 10.1002/wrna.1451. Epub 2017 Sep 26. Wiley Interdiscip Rev RNA. 2018. PMID: 28949076 Review.
Cited by
-
Ustilago maydis natural antisense transcript expression alters mRNA stability and pathogenesis.Mol Microbiol. 2013 Jul;89(1):29-51. doi: 10.1111/mmi.12254. Epub 2013 May 30. Mol Microbiol. 2013. PMID: 23650872 Free PMC article.
-
Identification of the novel HLA-C*03:03:01:52N allele, a splice-site variant at the boundary of intron1 and exon2.HLA. 2022 Jan;99(1):50-51. doi: 10.1111/tan.14454. Epub 2021 Oct 14. HLA. 2022. PMID: 34632728 Free PMC article.
-
DRANetSplicer: A Splice Site Prediction Model Based on Deep Residual Attention Networks.Genes (Basel). 2024 Mar 26;15(4):404. doi: 10.3390/genes15040404. Genes (Basel). 2024. PMID: 38674339 Free PMC article.
-
Normal and altered pre-mRNA processing in the DMD gene.Hum Genet. 2017 Sep;136(9):1155-1172. doi: 10.1007/s00439-017-1820-9. Epub 2017 Jun 9. Hum Genet. 2017. PMID: 28597072 Review.
-
The paralog-to-contig assignment problem: high quality gene models from fragmented assemblies.Algorithms Mol Biol. 2016 Feb 24;11:1. doi: 10.1186/s13015-016-0063-y. eCollection 2016. Algorithms Mol Biol. 2016. PMID: 26913054 Free PMC article.
References
-
- Penotti, F.E. (1991) Human Pre-mRNA splicing signals. J. Theor. Biol., 150, 385–420. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous
