Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Aug;13(8):1930-7.
doi: 10.1101/gr.1261703. Epub 2003 Jul 17.

Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions

Affiliations

Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions

Daniel Kotlar et al. Genome Res. 2003 Aug.

Abstract

A new measure for gene prediction in eukaryotes is presented. The measure is based on the Discrete Fourier Transform (DFT) phase at a frequency of 1/3, computed for the four binary sequences for A, T, C, and G. Analysis of all the experimental genes of S. cerevisiae revealed distribution of the phase in a bell-like curve around a central value, in all four nucleotides, whereas the distribution of the phase in the noncoding regions was found to be close to uniform. Similar findings were obtained for other organisms. Several measures based on the phase property are proposed. The measures are computed by clockwise rotation of the vectors, obtained by DFT for each analysis frame, by an angle equal to the corresponding central value. In protein coding regions, this rotation is assumed to closely align all vectors in the complex plane, thereby amplifying the magnitude of the vector sum. In noncoding regions, this operation does not significantly change this magnitude. Computing the measures with one chromosome and applying them on sequences of others reveals improved performance compared with other algorithms that use the 1/3 frequency feature, especially in short exons. The phase property is also used to find the reading frame of the sequence.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Computing Ub(N/3) in the case fminf(b,1).
Figure 2
Figure 2
Argument distributions for all experimental genes in all chromosomes in S. cerevisiae.
Figure 3
Figure 3
Argument distribution for noncoding regions in all chromosomes in S. cerevisiae.
Figure 4
Figure 4
(A) Argument distribution for all experimental genes in all chromosomes of S. cerevisiae. (B) Argument distribution for all genes in chromosomes 2 and 3 of S. pombe. (C) Argument distribution for all genes in chromosome 1 of Guillardia theta.
Figure 5
Figure 5
(A) Rotation and alignment of the vectors G(s) and T(s), when arg(T[s]) ≈ μT and arg(G[s]) ≈ μG. (B) Rotation and alignment of the vectors G(s) and T(s), when arg(T[s]) and arg(G[s]) are any random values.
Figure 5
Figure 5
(A) Rotation and alignment of the vectors G(s) and T(s), when arg(T[s]) ≈ μT and arg(G[s]) ≈ μG. (B) Rotation and alignment of the vectors G(s) and T(s), when arg(T[s]) and arg(G[s]) are any random values.
Figure 6
Figure 6
Probability density functions for Spectral Rotation (bold) and Spectral Content (fine) measures (solid lines represent exons and dashed lines represent noncoding regions).
Figure 7
Figure 7
Argument distribution of coding DNA strands of length 120 bp in S. cerevisiae.
Figure 8
Figure 8
Graphs of gene prediction applied on the gene SPBC582.08 in chromosome 2 of S. pombe, using a sliding window of 180 bp: (A) TG-Rotation measure; (B) Codon Usage measure. The horizontal segments represent the actual location of the three exons. To get the actual base location in the chromosome, add 300,000 bp to the numbers on the horizontal axis.
Figure 9
Figure 9
Graphs of the SR measure on the gene SPBC1685.08 in chromosome 2 of S. pombe, using a sliding window of 351 bp. (A) The measure; (B) arg(V). The horizontal segments represent the actual location of the exons. To get the actual base location in the chromosome, add 400,000 bp to the numbers on the horizontal axis.
Figure 10
Figure 10
Graphs of the SR measure on the gene SPBC1709.08 in chromosome 2 of S. pombe, using a sliding window of 351 bp. (A) The measure; (B) arg(V). The horizontal segment represents the actual location of the gene. To get the actual base location in the chromosome, add 1,000,000 bp to the numbers on the horizontal axis.

References

    1. Almagor, H. 1985. Nucleotide distribution and the recognition of coding regions in DNA sequences: An information theory approach. J. Theor. Biol. 117: 127-136. - PubMed
    1. Anastassiou, D. 2000. Frequency-domain analysis of biomolecular sequences. Bioinformatics 16: 1073-1082. - PubMed
    1. Baldi, P. and Brunak S. 2001. Bioinformatics: The machine learning approach 2nd ed., chapter 7. MIT Press, Cambridge, MA.
    1. Borodovsky, M.Y., Sprizhitsky, Y.A., Golovanov, E.I., and Alexandrov, A.A. 1986. Statistical patterns in the primary structure of the functional regions of the Escherichia coli genome. II. Nonuniform Markov models. Mol. Biol. 20: 833-840.
    1. Borodovsky, M.Y., Koonin, E.V., and Rudd, K.E. 1994. New genes in old sequence: A strategy for finding genes in the bacterial genome. Trends Biochem. Sci. 19: 309-313. - PubMed

WEB SITE REFERENCES

    1. http://www.ncbi.nlm.nih.gov/GenBank; National Center for Biotechnology Information.

MeSH terms

LinkOut - more resources