Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jun 15;25(12):i289-95.
doi: 10.1093/bioinformatics/btp232.

REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform

Affiliations

REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform

Luca Marsella et al. Bioinformatics. .

Abstract

Motivation: Proteins with solenoid repeats evolve more quickly than non-repetitive ones and their periodicity may be rapidly hidden at sequence level, while still evident in structure. In order to identify these repeats, we propose here a novel method based on a metric characterizing amino-acid properties (polarity, secondary structure, molecular volume, codon diversity, electric charge) using five previously derived numerical functions.

Results: The five spectra of the candidate sequences coding for structural repeats, obtained by Discrete Fourier Transform (DFT), show common features allowing determination of repeat periodicity with excellent results. Moreover it is possible to introduce a phase space parameterized by two quantities related to the Fourier spectra which allow for a clear distinction between a non-homologous set of globular proteins and proteins with solenoid repeats. The DFT method is shown to be competitive with other state of the art methods in the detection of solenoid structures, while improving its performance especially in the identification of periodicities, since it is able to recognize the actual repeat length in most cases. Moreover it highlights the relevance of local structural propensities in determining solenoid repeats.

Availability: A web tool implementing the algorithm presented in the article (REPETITA) is available with additional details on the data sets at the URL: http://protein.bio.unipd.it/repetita/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Cartoon representation of sample solenoid structures. Rainbow coloring from blue to red shows the topology from the N- to the C-terminus. (A) Antifreeze protein (PDB 1EZG), (B) Pectate Lyase(PDB 1AIR), (C) Leucine Rich Repeat (LRR) variant (PDB 1JL5), (D) LRR (PDB 1YRG) and (E) Armadillo (PDB 2BCT). All pictures were drawn using PyMol (URL: http://pymol.sourceforge.net/).
Fig. 2.
Fig. 2.
Fourier spectral amplitudes of Atchley's functions of the 3-solenoid domain of the antifreeze protein with sequence length N = 82 (PDB identifier: 1EZG). The peaks around frequencies n = 14, 21, 28, 35 belong to the harmonic series of the fundamental frequency rank n = 7, which appears as global maximum in the spectrum of Atchley's function 2 (top right) and as local maximum in the others. It corresponds to a periodic repeat T = 12 [computed using Equation (3)], in agreement with the actual structural repeat.
Fig. 3.
Fig. 3.
Maximum z-score of the amplitudes (zmax, x-axis) and optimal θ- ratio (ρθ, y-axis) are shown in the scatter plot for the joint training and test set of sequences. The separation of the regions with mainly non-solenoids (green crosses, bottom left) and solenoid repeat sequences (red crosses, top right) is remarkable, even if few proteins lay on the opposite side, in the vicinity of the optimal line separating the two sets. The result corresponding to the 3-solenoid domain of the antifreeze protein (PDB identifier: 1EZG) is shown as a blue square.
Fig. 4.
Fig. 4.
Comparison of REPETITA, RADAR and TRUST on the total set of sequences: the number of false positives (x-axis) is plotted against the number of true positives (y-axis). Predictions are ranked according to the values of the parameter measuring the reliability of the methods (for REPETITA, it is the signed distance from the optimal line of Fig. 3). Two black circles are drawn to highlight REPETITA predictions with signed distance thresholds at +1 and 0, respectively. Note that the first 25 predictions of REPETITA are all true positives.
Fig. 5.
Fig. 5.
Detection of periodicity of repeats: comparison of REPETITA, RADAR and TRUST. Predictions were counted as correct if they were respectively within one residue of the full, half or double of the structural repeat length. REPETITA outperforms both RADAR and TRUST.

Similar articles

Cited by

References

    1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Andrade MA, et al. Comparison of ARM and HEAT protein repeats. J. Mol. Biol. 2001;309:1–18. - PubMed
    1. Andrade MA, et al. Homology-based method for identification of protein repeats using statistical significance estimates. J. Mol. Biol. 2000;298:521–537. - PubMed
    1. Atchley WR, et al. Solving the protein sequence metric problem. Proc. Natl Acad. Sci. USA. 2005;102:6395–6400. - PMC - PubMed
    1. Biegert A, Soding J. De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics. 2008;24:807–814. - PubMed

Publication types