Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 2:16:207.
doi: 10.1186/s12859-015-0648-3.

Capturing coevolutionary signals inrepeat proteins

Affiliations

Capturing coevolutionary signals inrepeat proteins

Rocío Espada et al. BMC Bioinformatics. .

Abstract

Background: The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts - portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins - natural systems for which the identification of folding domains remains challenging.

Results: We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families.

Conclusions: The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Repeat proteins are formed with tandem arrays of repeats. The crystal structures of members of different repeat protein families are shown, with the backbone colored according to the repeated units. The molecular surface of the repeat array is drawn in transparent gray. a ANK family (PDB:1IKN, chain D), b WD40 family (PDB:1ERJ, chain A), c TPR family (PDB:4GCO), d LRR family (PDB:4NKH, chain A), e ANEX family (PDB:2ZOC, chain A), f PUF family (PDB:2YJY, chain A), g HEAT family (PDB:4G3A, chain A), and h ARM family (PDB:2BCT)
Fig. 2
Fig. 2
The sequence identity between repeated units can bias the inference of evolutionary couplings. Repeat sequences of the ANK familywere concatenated in a MSA of size 2L 0=66 positions and ≈73000 sequences and co-variations were measured with direct information metric.a Sequence identity distributions between consecutive ANK repeats found in (x) natural proteins and (o) randomized pairs of repeats. b Direct information matrices between positions obtained without correcting (DI, upper half) or with proper equalization for repeat identity (DI id, lower half)
Fig. 3
Fig. 3
Native contacts can be predicted from the identity-equalized direct information DI id. On the center we show on grey shadow the contact map (closest atoms at distance lower than 8 Å) of representative family members a ANEX (PDB:2ZOC, chain A) b ANK (PDB:1N11, chain A) c TPR (PDB:4GCO). d PENTAPEPTIDE (PDB: 3DU1, chain X). On the upper triangle DI hits are marked in red crosses when they do not match a contact and on green circles when they do. On the lower triangle DI id hits are marked in red crosses when they do not match a contact and on green circles when they do. On their side we show the structure used with the backbones as gray ribbons, and the first 20 predicted contacts along multiple repeat pairs in red. On the right we compare the true positive rate obtained using DI (black triangles) and DI id (red squares) as predictor of contacts on the selected structure
Fig. 4
Fig. 4
Correlations along ANK repeat arrays. a Direct information first 50 hits over a contact map (PDB:1N11,A, resid 436 to 534) calculated for three consecutive ANK repeats without (upper triangle) or with (lower triangle) the DI id equalization. b Proportion of DI (black diamonds) and DI id (red circles) hits between repeated units for alignments of n-th neighbours. The red line is a non-linear fit of the DI id data to an exponential decay
Fig. 5
Fig. 5
Robustness of the DI id procedure. Subsets of alignments were constructed by recurrently removing random groups of sequences from each dataset of repeat pairs. M eff is the number of effective sequences used in the alignment. a Particular examples of the stability of DI id assignments as sampling changes on the ANK family. The gray shadow delimits the 1 % fluctuation interval set as a convergence criteria. b Overall stability of the DI id assignments in several repeat protein families

References

    1. Wetlaufer DB. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci USA. 1973;70(3):697–701. doi: 10.1073/pnas.70.3.697. - DOI - PMC - PubMed
    1. Peisajovich SG, Tawfik DS. Protein engineers turned evolutionists. Nat Methods. 2007;4(12):991–4. doi: 10.1038/nmeth1207-991. - DOI - PubMed
    1. Jacob F. Evolution and tinkering. Science. 1977;196(4295):1161–6. doi: 10.1126/science.860134. - DOI - PubMed
    1. Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. - DOI - PubMed
    1. Ferreiro DU, Hegler JA, Komives EA, Wolynes PG. Localizing frustration in native proteins and protein assemblies. Proc Natl Acad Sci USA. 2007;104(50):19819–24. doi: 10.1073/pnas.0709915104. - DOI - PMC - PubMed

Publication types

LinkOut - more resources