An improved system for exon recognition and gene modeling in human DNA sequences
- PMID: 7584416
An improved system for exon recognition and gene modeling in human DNA sequences
Abstract
A new version of the GRAIL system (Uberbacher and Mural, 1991; Mural et al., 1992; Uberbacher et al., 1993), called GRAIL II, has recently been developed (Xu et al., 1994). GRAIL II is a hybrid AI system that supports a number of DNA sequence analysis tools including protein-coding region recognition, PolyA site and transcription promoter recognition, gene model construction, translation to protein, and DNA/protein database searching capabilities. This paper presents the core of GRAIL II, the coding exon recognition and gene model construction algorithms. The exon recognition algorithm recognizes coding exons by combining coding feature analysis and edge signal (acceptor/donor/translation-start sites) detection. Unlike the original GRAIL system (Uberbacher and Mural, 1991; Mural et al., 1992), this algorithm uses variable-length windows tailored to each potential exon candidate, making its performance almost exon length-independent. In this algorithm, the recognition process is divided into four steps. Initially a large number of possible coding exon candidates are generated. Then a rule-based prescreening algorithm eliminates the majority of the improbable candidates. As the kernel of the recognition algorithm, three neural networks are trained to evaluate the remaining candidates. The outputs of the neural networks are then divided into clusters of candidates, corresponding to presumed exons. The algorithm makes its final prediction by picking the best canadidate from each cluster. The gene construction algorithm (Xu, Mural and Uberbacher, 1994) uses a dynamic programming approach to build gene models by using as input the clusters predicted by the exon recognition algorithm. Extensive testing has been done on these two algorithms.(ABSTRACT TRUNCATED AT 250 WORDS)
Similar articles
-
Recognizing exons in genomic sequence using GRAIL II.Genet Eng (N Y). 1994;16:241-53. Genet Eng (N Y). 1994. PMID: 7765200
-
Constructing gene models from accurately predicted exons: an application of dynamic programming.Comput Appl Biosci. 1994 Dec;10(6):613-23. doi: 10.1093/bioinformatics/10.6.613. Comput Appl Biosci. 1994. PMID: 7704660
-
Identification of human gene structure using linear discriminant functions and dynamic programming.Proc Int Conf Intell Syst Mol Biol. 1995;3:367-75. Proc Int Conf Intell Syst Mol Biol. 1995. PMID: 7584460
-
Correcting sequencing errors in DNA coding regions using a dynamic programming approach.Comput Appl Biosci. 1995 Apr;11(2):117-24. doi: 10.1093/bioinformatics/11.2.117. Comput Appl Biosci. 1995. PMID: 7620982
-
Using MZEF to find internal coding exons.Curr Protoc Bioinformatics. 2002 Aug;Chapter 4:Unit 4.2. doi: 10.1002/0471250953.bi0402s00. Curr Protoc Bioinformatics. 2002. PMID: 18792940 Review.
Cited by
-
Cancer-specific chromosome alterations in the constitutive fragile region FRA3B.Proc Natl Acad Sci U S A. 1999 Jun 22;96(13):7456-61. doi: 10.1073/pnas.96.13.7456. Proc Natl Acad Sci U S A. 1999. PMID: 10377436 Free PMC article.
-
A preliminary gene map for the Van der Woude syndrome critical region derived from 900 kb of genomic sequence at 1q32-q41.Genome Res. 2000 Jan;10(1):81-94. Genome Res. 2000. PMID: 10645953 Free PMC article.
-
Evaluation of gene-finding programs on mammalian sequences.Genome Res. 2001 May;11(5):817-32. doi: 10.1101/gr.147901. Genome Res. 2001. PMID: 11337477 Free PMC article.
-
Construction of a high-resolution physical map of the approximate 1-Mb region of human chromosome 7q31.1-q31.2 harboring a putative tumor suppressor gene.Neoplasia. 1999 Apr;1(1):16-22. doi: 10.1038/sj.neo.7900011. Neoplasia. 1999. PMID: 10935466 Free PMC article.
-
Sequence of the FRA3B common fragile region: implications for the mechanism of FHIT deletion.Proc Natl Acad Sci U S A. 1997 Dec 23;94(26):14584-9. doi: 10.1073/pnas.94.26.14584. Proc Natl Acad Sci U S A. 1997. PMID: 9405656 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Research Materials