Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan;39(1):e3.
doi: 10.1093/nar/gkq891. Epub 2010 Oct 14.

Visualization of the protein-coding regions with a self adaptive spectral rotation approach

Affiliations

Visualization of the protein-coding regions with a self adaptive spectral rotation approach

Bo Chen et al. Nucleic Acids Res. 2011 Jan.

Abstract

Identifying protein-coding regions in DNA sequences is an active issue in computational biology. In this study, we present a self adaptive spectral rotation (SASR) approach, which visualizes coding regions in DNA sequences, based on investigation of the Triplet Periodicity property, without any preceding training process. It is proposed to help with the rough coding regions prediction when there is no extra information for the training required by other outstanding methods. In this approach, at each position in the DNA sequence, a Fourier spectrum is calculated from the posterior subsequence. Following the spectrums, a random walk in complex plane is generated as the SASR's graphic output. Applications of the SASR on real DNA data show that patterns in the graphic output reveal locations of the coding regions and the frame shifts between them: arcs indicate coding regions, stable points indicate non-coding regions and corners' shapes reveal frame shifts. Tests on genomic data set from Saccharomyces Cerevisiae reveal that the graphic patterns for coding and non-coding regions differ to a great extent, so that the coding regions can be visually distinguished. Meanwhile, a time cost test shows that the SASR can be easily implemented with the computational complexity of O(N).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Transforming the DNA sequence into the TP sequence.
Figure 2.
Figure 2.
The TP sequence generation algorithm.
Figure 3.
Figure 3.
The TP walk of the first coding region (3307–4260) from the H. sapiens (Human) mitochondrial DNA sequence. (a) Walk trace in the complex plane. (b) Plot of the real part (black) and imaginary part (gray) of the points in the trace against the growing value of position t.
Figure 4.
Figure 4.
The TP walk of the sequence before the first coding region (1–3306, non-coding region without TP property) of the H. sapiens (Human) mitochondrial DNA. (a) Walk trace in the complex plane. (b) Plot of the real part (black) and imaginary part (gray) of the points in the trace against the growing value of position t.
Figure 5.
Figure 5.
C0-I-C1 Chain. N* is the length of sub-sequence *.
Figure 6.
Figure 6.
A sketch of the TP walk trace of the C0-I-C1 chain, when the two coding regions are in a same reading direction. (a) Δ = 0. (b) Δ = 1. (c) Δ = 2.
Figure 7.
Figure 7.
The TP walk of 4050–6281 Gallus Gallus (Chicken) mitochondrial DNA sequence (Δ = 0). (a) Walk trace in the complex plane. (b) Plot of the real part (black) and imaginary part (gray) of the points in the trace against the growing value of position t and the dark areas stand for the coding regions.
Figure 8.
Figure 8.
The TP walk of 3654–5862 Halichoerus grypus (Gray Seal) mitochondrial DNA sequence (Δ = 1). (a) Walk trace in the complex plane. (b) Plot of the real part (black) and imaginary part (gray) of the points in the trace against the growing value of position t and the dark areas stand for the coding regions.
Figure 9.
Figure 9.
The TP walk of 3307–5510 H. sapiens (Human) mitochondrial DNA sequence (Δ = 2). (a) Walk trace in the complex plane. (b) Plot of the real part (black) and imaginary part (gray) of the points in the trace against the growing value of position t and the dark areas stand for the coding regions.
Figure 10.
Figure 10.
The TP walk trace of the complete H. sapiens (Human) mitochondrial DNA sequence in the complex plane with coding regions marked in different colors. The top-right is the plot of the real part (red) and imaginary part (blue) against the position value t and the dark areas stand for the coding regions.
Figure 11.
Figure 11.
Plot of the time cost against the sequence’s length N. The horizontal axis stands for the sequence’s length and the vertical axis stands for the time cost in millisecond.
Figure 12.
Figure 12.
The RR distributions in the coding set (black) and the non-coding set (gray). (a) The CDF. (b) The PDF.
Figure 13.
Figure 13.
The accuracy in classifying sequences. (a) The sensitivity (black) and specificity (gray) in the classification by using the RR measure. (b) The averages of the sensitivity and specificity in the classification by using the OSCM (red), the SRM (green) and the RR measure (blue).

Similar articles

Cited by

References

    1. Bennetzen JL, Hall BD. Codon selection in yeast. J. Biol. Chem. 1982;257:3026–3031. - PubMed
    1. Staden R, McLachlan AD. Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 1982;10:141–156. - PMC - PubMed
    1. Claverie JM, Bougueleret L. Heuristic informational analysis of sequences. Nucleic Acids Res. 1986;14:179–196. - PMC - PubMed
    1. Peng CK, Buldyrev SV, Goldberger AL, Havlin S, Sciortino F, Simons M, Stanley HE. Long-range correlations in nucleotide sequences. Nature. 1992;356:168–170. - PubMed
    1. Li W. The complexity of DNA. Complexity. 1997;3:33–37.

Publication types