Identification of human gene structure using linear discriminant functions and dynamic programming
- PMID: 7584460
Identification of human gene structure using linear discriminant functions and dynamic programming
Abstract
Development of advanced technique to identify gene structure is one of the main challenges of the Human Genome Project. Discriminant analysis was applied to the construction of recognition functions for various components of gene structure. Linear discriminant functions for splice sites, 5'-coding, internal exon, and 3'-coding region recognition have been developed. A gene structure prediction system FGENE has been developed based on the exon recognition functions. We compute a graph of mutual compatibility of different exons and present a gene structure models as paths of this directed acyclic graph. For an optimal model selection we apply a variant of dynamic programming algorithm to search for the path in the graph with the maximal value of the corresponding discriminant functions. Prediction by FGENE for 185 complete human gene sequences has 81% exact exon recognition accuracy and 91% accuracy at the level of individual exon nucleotides with the correlation coefficient (C) equals 0.90. Testing FGENE on 35 genes not used in the development of discriminant functions shows 71% accuracy of exact exon prediction and 89% at the nucleotide level (C = 0.86). FGENE compares very favorably with the other programs currently used to predict protein-coding regions. Analysis of uncharacterized human sequences based on our methods for splice site (HSPL, RNASPL), internal exons (HEXON), all type of exons (FEXH) and human (FGENEH) and bacterial (CDSB) gene structure prediction and recognition of human and bacterial sequences (HBR) (to test a library for E. coli contamination) is available through the University of Houston, Weizmann Institute of Science network server and a WWW page of the Human Genome Center at Baylor College of Medicine.
Similar articles
-
The Gene-Finder computer tools for analysis of human and model organisms genome sequences.Proc Int Conf Intell Syst Mol Biol. 1997;5:294-302. Proc Int Conf Intell Syst Mol Biol. 1997. PMID: 9322052
-
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.Proc Int Conf Intell Syst Mol Biol. 1994;2:354-62. Proc Int Conf Intell Syst Mol Biol. 1994. PMID: 7584412
-
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.Nucleic Acids Res. 1994 Dec 11;22(24):5156-63. doi: 10.1093/nar/22.24.5156. Nucleic Acids Res. 1994. PMID: 7816600 Free PMC article.
-
Using MZEF to find internal coding exons.Curr Protoc Bioinformatics. 2002 Aug;Chapter 4:Unit 4.2. doi: 10.1002/0471250953.bi0402s00. Curr Protoc Bioinformatics. 2002. PMID: 18792940 Review.
-
Exonization of transposed elements: A challenge and opportunity for evolution.Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26. Biochimie. 2011. PMID: 21787833 Review.
Cited by
-
Genome size evolution in pufferfish: an insight from BAC clone-based Diodon holocanthus genome sequencing.BMC Genomics. 2010 Jun 23;11:396. doi: 10.1186/1471-2164-11-396. BMC Genomics. 2010. PMID: 20569428 Free PMC article.
-
Method of predicting splice sites based on signal interactions.Biol Direct. 2006 Apr 3;1:10. doi: 10.1186/1745-6150-1-10. Biol Direct. 2006. PMID: 16584568 Free PMC article.
-
The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes.Plant Methods. 2009 Jun 19;5:8. doi: 10.1186/1746-4811-5-8. Plant Methods. 2009. PMID: 19545381 Free PMC article.
-
Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.PLoS One. 2012;7(11):e50609. doi: 10.1371/journal.pone.0050609. Epub 2012 Nov 30. PLoS One. 2012. PMID: 23226328 Free PMC article.
-
Computational gene finding in plants.Plant Mol Biol. 2002 Jan;48(1-2):39-48. Plant Mol Biol. 2002. PMID: 11860211 Review.