A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures
- PMID: 10373578
- PMCID: PMC148470
- DOI: 10.1093/nar/27.13.2627
A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures
Abstract
A clean data set of verified splice sites from Homo sapiens are reported as well as the standards used for the clean-up procedure. The sites were validated by: (i) standard cleaning procedures such as requiring consistency in the annotation of the gene structural elements, completeness of the coding regions and elimination of redundant sequences; (ii) clustering by decision trees coupled with analysis of ClustalW alignments of the translated protein sequence with homologous proteins from SWISS-PROT; (iii) matching against human EST sequences. The sites are categorised as: (i) donor sites, a set of 619 EST-confirmed donor sites, for which 138 are either the sites or the regions around the sites involved in alternative splice events; (ii) acceptor sites, a set of 623 EST-confirmed acceptor sites, for which 144 are either the sites or the regions around the sites are involved in alternative splice events; (iii) genuine splice sites, a set of 392 splice sites wherein both the donor and acceptor sites had EST confirmation and were not involved in any alternative splicing; (iv) alternative splice sites, a set of 209 splice sites wherein both the donor and acceptor sites had EST confirmation and the sites or the regions around them were involved in alternative splicing. A set of nucleotide regions that can be used to generate a control set of false splice sites that have a high confidence of being non-functional are also reported.
Similar articles
-
Analysis of canonical and non-canonical splice sites in mammalian genomes.Nucleic Acids Res. 2000 Nov 1;28(21):4364-75. doi: 10.1093/nar/28.21.4364. Nucleic Acids Res. 2000. PMID: 11058137 Free PMC article.
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Prediction of human mRNA donor and acceptor sites from the DNA sequence.J Mol Biol. 1991 Jul 5;220(1):49-65. doi: 10.1016/0022-2836(91)90380-o. J Mol Biol. 1991. PMID: 2067018
-
Exonization of transposed elements: A challenge and opportunity for evolution.Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26. Biochimie. 2011. PMID: 21787833 Review.
-
How prevalent is functional alternative splicing in the human genome?Trends Genet. 2004 Feb;20(2):68-71. doi: 10.1016/j.tig.2003.12.004. Trends Genet. 2004. PMID: 14746986 Review.
Cited by
-
Intron-flanking EST-PCR markers: from genetic marker development to gene structure analysis in Rhododendron.Theor Appl Genet. 2005 Nov;111(7):1347-56. doi: 10.1007/s00122-005-0064-6. Epub 2005 Nov 15. Theor Appl Genet. 2005. PMID: 16167139
-
Analysis of canonical and non-canonical splice sites in mammalian genomes.Nucleic Acids Res. 2000 Nov 1;28(21):4364-75. doi: 10.1093/nar/28.21.4364. Nucleic Acids Res. 2000. PMID: 11058137 Free PMC article.
-
Optimization of oligonucleotide arrays and RNA amplification protocols for analysis of transcript structure and alternative splicing.Genome Biol. 2003;4(10):R66. doi: 10.1186/gb-2003-4-10-r66. Epub 2003 Sep 19. Genome Biol. 2003. PMID: 14519201 Free PMC article.
-
Analysis of the role of Caenorhabditis elegans GC-AG introns in regulated splicing.Nucleic Acids Res. 2002 Aug 1;30(15):3360-7. doi: 10.1093/nar/gkf465. Nucleic Acids Res. 2002. PMID: 12140320 Free PMC article.
-
Positional characterisation of false positives from computational prediction of human splice sites.Nucleic Acids Res. 2000 Feb 1;28(3):744-54. doi: 10.1093/nar/28.3.744. Nucleic Acids Res. 2000. PMID: 10637326 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials