Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach
- PMID: 1763041
- PMCID: PMC53114
- DOI: 10.1073/pnas.88.24.11261
Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach
Abstract
Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.
Similar articles
-
Locating protein coding regions in human DNA using a decision tree algorithm.J Comput Biol. 1995 Fall;2(3):473-85. doi: 10.1089/cmb.1995.2.473. J Comput Biol. 1995. PMID: 8521276
-
Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.Nucleic Acids Res. 1993 Feb 11;21(3):607-13. doi: 10.1093/nar/21.3.607. Nucleic Acids Res. 1993. PMID: 8441672 Free PMC article.
-
Recognizing exons in genomic sequence using GRAIL II.Genet Eng (N Y). 1994;16:241-53. Genet Eng (N Y). 1994. PMID: 7765200
-
Prediction of function in DNA sequence analysis.J Comput Biol. 1995 Spring;2(1):87-115. doi: 10.1089/cmb.1995.2.87. J Comput Biol. 1995. PMID: 7497122 Review.
-
Exploring the genesis and functions of Human Accelerated Regions sheds light on their role in human evolution.Curr Opin Genet Dev. 2014 Dec;29:15-21. doi: 10.1016/j.gde.2014.07.005. Epub 2014 Aug 23. Curr Opin Genet Dev. 2014. PMID: 25156517 Review.
Cited by
-
Genomic anatomy of a premier major histocompatibility complex paralogous region on chromosome 1q21-q22.Genome Res. 2001 May;11(5):789-802. doi: 10.1101/gr.175801. Genome Res. 2001. PMID: 11337475 Free PMC article.
-
Characterization of a meiotic crossover in maize identified by a restriction fragment length polymorphism-based method.Genetics. 1996 Aug;143(4):1771-83. doi: 10.1093/genetics/143.4.1771. Genetics. 1996. PMID: 8844163 Free PMC article.
-
Repetitive DNA sequences in the common vole: cloning, characterization and chromosome localization of two novel complex repeats MS3 and MS4 from the genome of the East European vole Microtus rossiaemeridionalis.Chromosome Res. 1998 Aug;6(5):351-60. doi: 10.1023/a:1009284031287. Chromosome Res. 1998. PMID: 9872664
-
A frameshift error detection algorithm for DNA sequencing projects.Nucleic Acids Res. 1995 Aug 11;23(15):2900-8. doi: 10.1093/nar/23.15.2900. Nucleic Acids Res. 1995. PMID: 7659513 Free PMC article.
-
The mouse Aire gene: comparative genomic sequencing, gene organization, and expression.Genome Res. 1999 Feb;9(2):158-66. Genome Res. 1999. PMID: 10022980 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases