Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames
- PMID: 7816600
- PMCID: PMC332054
- DOI: 10.1093/nar/22.24.5156
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames
Abstract
A new method which predicts internal exon sequences in human DNA has been developed. The method is based on a splice site prediction algorithm that uses the linear discriminant function to combine information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotides in protein coding and intron regions. The accuracy of our splice site recognition function is 97% for donor splice sites and 96% for acceptor splice sites. For exon prediction, we combine in a discriminant function the characteristics describing the 5'-intron region, donor splice site, coding region, acceptor splice site and 3'-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79%. The recognition quality computed at the level of individual nucleotides is 89% for exon sequences and 98% for intron sequences. This corresponds to a correlation coefficient for exon prediction of 0.87. The precision of this approach is better than other methods and has been tested on a larger data set. We have also developed a means for predicting exon-exon junctions in cDNA sequences, which can be useful for selecting optimal PCR primers.
Similar articles
-
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.Proc Int Conf Intell Syst Mol Biol. 1994;2:354-62. Proc Int Conf Intell Syst Mol Biol. 1994. PMID: 7584412
-
The prediction of exons through an analysis of spliceable open reading frames.Nucleic Acids Res. 1992 Jul 11;20(13):3453-62. doi: 10.1093/nar/20.13.3453. Nucleic Acids Res. 1992. PMID: 1321415 Free PMC article.
-
The 5' leader of plant PgiC has an intron: the leader shows both the loss and maintenance of constraints compared with introns and exons in the coding region.Mol Biol Evol. 2002 Sep;19(9):1613-23. doi: 10.1093/oxfordjournals.molbev.a004223. Mol Biol Evol. 2002. PMID: 12200488
-
Mutations that alter RNA splicing of the human HPRT gene: a review of the spectrum.Mutat Res. 1998 Nov;411(3):179-214. doi: 10.1016/s1383-5742(98)00013-1. Mutat Res. 1998. PMID: 9804951 Review.
-
Exonization of transposed elements: A challenge and opportunity for evolution.Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26. Biochimie. 2011. PMID: 21787833 Review.
Cited by
-
Identification of an elusive spliceogenic MYBPC3 variant in an otherwise genotype-negative hypertrophic cardiomyopathy pedigree.Sci Rep. 2022 May 4;12(1):7284. doi: 10.1038/s41598-022-11159-y. Sci Rep. 2022. PMID: 35508642 Free PMC article.
-
Identification of programmed translational -1 frameshifting sites in the genome of Saccharomyces cerevisiae.Genome Res. 2005 Oct;15(10):1411-20. doi: 10.1101/gr.4258005. Genome Res. 2005. Retraction in: Genome Res. 2006 Aug;16(8):1074. PMID: 16204194 Free PMC article. Retracted.
-
A novel approach to describe a U1 snRNA binding site.Nucleic Acids Res. 2003 Dec 1;31(23):6963-75. doi: 10.1093/nar/gkg901. Nucleic Acids Res. 2003. PMID: 14627829 Free PMC article.
-
The acid-inducible asr gene in Escherichia coli: transcriptional control by the phoBR operon.J Bacteriol. 1999 Apr;181(7):2084-93. doi: 10.1128/JB.181.7.2084-2093.1999. J Bacteriol. 1999. PMID: 10094685 Free PMC article.
-
A new non-HLA multigene family associated with the PERB11 family within the MHC class I region.Immunogenetics. 1996;44(4):259-67. doi: 10.1007/BF02602555. Immunogenetics. 1996. PMID: 8753856
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous