Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 1993 Feb 11;21(3):607-13.
doi: 10.1093/nar/21.3.607.

Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks

Affiliations
Free PMC article
Comparative Study

Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks

E E Snyder et al. Nucleic Acids Res. .
Free PMC article

Abstract

Dynamic programming (DP) is applied to the problem of precisely identifying internal exons and introns in genomic DNA sequences. The program GeneParser first scores the sequence of interest for splice sites and for these intron- and exon-specific content measures: codon usage, local compositional complexity, 6-tuple frequency, length distribution and periodic asymmetry. This information is then organized for interpretation by DP. GeneParser employs the DP algorithm to enforce the constraints that introns and exons must be adjacent and non-overlapping and finds the highest scoring combination of introns and exons subject to these constraints. Weights for the various classification procedures are determined by training a simple feed-forward neural network to maximize the number of correct predictions. In a pilot study, the system has been trained on a set of 56 human gene fragments containing 150 internal exons in a total of 158,691 bps of genomic sequence. When tested against the training data, GeneParser precisely identifies 75% of the exons and correctly predicts 86% of coding nucleotides as coding while only 13% of non-exon bps were predicted to be coding. This corresponds to a correlation coefficient for exon prediction of 0.85. Because of the simplicity of the network weighting scheme, generalization performance is nearly as good as with the training set.

PubMed Disclaimer

Similar articles

Cited by

References

    1. J Mol Biol. 1970 Mar;48(3):443-53 - PubMed
    1. J Mol Biol. 1992 Jul 20;226(2):471-9 - PubMed
    1. Nucleic Acids Res. 1981 Jan 10;9(1):133-48 - PubMed
    1. Nucleic Acids Res. 1982 Jan 11;10(1):141-56 - PubMed
    1. Nucleic Acids Res. 1982 Sep 11;10(17):5303-18 - PubMed

Publication types