Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 1996 Aug 20;93(17):9061-6.
doi: 10.1073/pnas.93.17.9061.

Gene recognition via spliced sequence alignment

Affiliations
Comparative Study

Gene recognition via spliced sequence alignment

M S Gelfand et al. Proc Natl Acad Sci U S A. .

Abstract

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.

PubMed Disclaimer

References

    1. Nucleic Acids Res. 1982 Sep 11;10(17):5303-18 - PubMed
    1. J Comput Biol. 1996 Summer;3(2):223-34 - PubMed
    1. Nucleic Acids Res. 1983 May 11;11(9):2943-57 - PubMed
    1. J Biol Chem. 1988 Jul 25;263(21):10326-31 - PubMed
    1. Bull Math Biol. 1989;51(1):5-37 - PubMed

Publication types