Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Feb 1;41(3):1395-405.
doi: 10.1093/nar/gks1261. Epub 2012 Dec 14.

Categorical spectral analysis of periodicity in human and viral genomes

Affiliations

Categorical spectral analysis of periodicity in human and viral genomes

Elizabeth D Howe et al. Nucleic Acids Res. .

Abstract

Periodicity in nucleotide sequences arises from regular repeating patterns which may reflect important structure and function. Although a three-base periodicity in coding regions has been known for some time and has provided the basis for powerful gene prediction algorithms, its origins are still not fully understood. Here, we show that, contrary to common belief, amino acid (AA) bias and codon usage bias are insufficient to create base-3 periodicity. This article applies the rigorous method of spectral envelope to systematically characterize the contributions of codon bias, AA bias and protein structural motifs to the three-base periodicity of coding sequences. The method is also used to classify CpG islands in the human genome. In addition, we show how spectral envelope can be used to trace the evolution of viral genomes and monitor global sequence changes without having to align to previously known genomes. This approach also detects reassortment events, such as those that led to the 2009 pandemic H1N1 virus.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(a) Histogram of the N-statistic (quantification of the contribution of codon bias to periodicity) for protein domains with significant three-nucleotide periodicity. (b) Histogram of the A-statistic (quantification of the contribution of AA position to periodicity) for protein domains with significant three-nucleotide periodicity.
Figure 2.
Figure 2.
Area-preserving 2D projection of the scaling function of protein domains with significant period-3 property.
Figure 3.
Figure 3.
The ratio of observed over expected spectral envelope at frequency 1/3 is highly correlated with the length of protein.
Figure 4.
Figure 4.
Spectral envelope calculations for the protein motif PS00108 which has a significant period-3 property. Black line is the spectral envelope of the original DNA sequence; blue, spectral envelope of synonymous DNA sequence found by MCMC minimization at frequency 1/3; red, spectral envelope of the synonymous DNA sequence found by MCMC maximization; dashed green, average spectral envelopes of 100 random synonymous sequences and solid green, average spectral envelope of 100 random permutations.
Figure 5.
Figure 5.
Spectral envelope of intronic sequences. (a) Period 3 is not present in 1000 random introns of length 100 bp, 1 kb, 2 kb and 3 kb. (b) Removing the stop codons (TAA, TGA and TAG) in a fixed frame from the sequences in (a) increases the spectrum at period 3.
Figure 6.
Figure 6.
Spectral envelope for CpG islands at (a) chr4:3565311-3567185 consisting of 78-mer tandem repeats and (b) chr11:89713070-89713801 having several segmental duplications on chromosome 11. The coordinates are in HG19.
Figure 7.
Figure 7.
Evolutionary trace of the scaling function for the human H1N1 (a) PA protein and (b) NP protein. Each dot corresponds to the median projected scaling functions of all sequences from the indicated year. (c) SVM decision boundaries and training data points are shown. Support vectors are indicated as x and other training data points are indicated as circles. Human data points are in black and avian or swine data points are in red.

Similar articles

Cited by

References

    1. Zhabinskaya D, Benham CJ. Theoretical analysis of competing conformational transitions in superhelical DNA. PLoS Comput. Biol. 2012;8:e1002484. - PMC - PubMed
    1. Tsankov AM, Thompson DA, Socha A, Regev A, Rando OJ. The role of nucleosome positioning in the evolution of gene regulation. PLoS Biol. 2010;8:e1000414. - PMC - PubMed
    1. Shepherd JCW. Periodic correlations in DNA sequences and evidence suggesting their evolutionary origin in a comma-less genetic code. J. Mol. Evol. 1981;17:94–102. - PubMed
    1. Shepherd JCW. Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proc. Natl Acad. Sci. USA. 1981;73:1596–1600. - PMC - PubMed
    1. Trifonov EN. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. J. Mol. Biol. 1987;194:643–652. - PubMed

Publication types