Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 10;9(1):8426.
doi: 10.1038/s41598-019-44928-3.

Sequence and structural patterns detected in entangled proteins reveal the importance of co-translational folding

Affiliations

Sequence and structural patterns detected in entangled proteins reveal the importance of co-translational folding

Marco Baiesi et al. Sci Rep. .

Abstract

Proteins must fold quickly to acquire their biologically functional three-dimensional native structures. Hence, these are mainly stabilized by local contacts, while intricate topologies such as knots are rare. Here, we reveal the existence of specific patterns adopted by protein sequences and structures to deal with backbone self-entanglement. A large scale analysis of the Protein Data Bank shows that loops significantly intertwined with another chain portion are typically closed by weakly bound amino acids. Why is this energetic frustration maintained? A possible picture is that entangled loops are formed only toward the end of the folding process to avoid kinetic traps. Consistently, these loops are more frequently found to be wrapped around a portion of the chain on their N-terminal side, the one translated earlier at the ribosome. Finally, these motifs are less abundant in natural native states than in simulated protein-like structures, yet they appear in 32% of proteins, which in some cases display an amazingly complex intertwining.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Sketches of proteins with (a) a knot, (b) two linked loops with cysteine closures (magenta dots), (c) two linked loops with virtual non-covalent closures (yellow and green dots form two different contacts), and (d) a loop (red) intertwined with an open chain portion (blue) - a “thread”. (e) A configuration with the loop (γi) closer to the C-terminus than the thread (γj), and (f) one with the loop closer to the N-terminus. In the two latter pictorial representations of non-structured proteins, we also show the loop-thread sequence separation s and the loop length m.
Figure 2
Figure 2
Some examples of protein domains with nontrivial entanglement, in which one notes a looped portion (red, with yellow ends, following the color code of Fig. 1d) intertwined with another portion of the protein, a thread (blue). (a) protein 2bjuA02, with L′ = 1.145 (also the blue chain contains a loop) and Gc = 1.285 of similar magnitude; (b) protein 3thtA01, with significant entanglement (Gc = −1.56) but without two linked loops (L′ = −0.34); (c) protein 3tnxA00, with large Gc = −3.07, while L′ = −1.31 is much smaller and with the same sign; (d) protein 2i06A01, with L=0.741, partitioned in the two corresponding linked loops and thus following the color code of Fig. 1c (green ends of the blue loop); (e) again 2i06A01, with Gc = −1.26, highlighting the related loop-thread partition. In the last two points one notes that the sign of L′ is opposite to the sign of Gc. It is an example of the coexistence of different forms of entanglement in the same protein domain. (f) protein 1otjA00, one of the protein domains with largest (absolute) Gaussian Entanglement, with Gc = −3.24 and L′ = −3.02. The red loop, with yellow ends, is extremely entangled with the blue portion (which in this case also contains a loop).
Figure 3
Figure 3
(A) Plot of L′ vs Gc for each protein in the CATH database; the five proteins shown in Fig. 2a–f are highlighted with the corresponding letter. (B) Smoothed histogram of data with significant linking (|L′| > 1/2). The highest probability is around GcL1. The data with the values of L′ and Gc computed for each protein in the CATH database are available at http://researchdata.cab.unipd.it/id/eprint/123.
Figure 4
Figure 4
(a) For four cases (see legend), distributions of the loop-thread sequence separation s. Error bars are based on the effectively independent countings determined through the clustering procedure. (b) For the separate cases of N- and C-terminal threads (see legend), tails of the distributions of the loop entanglement for |Gc(i)| > 1/2. Error bars are based on the effectively independent countings determined through the clustering procedure.
Figure 5
Figure 5
For both natural protein domains of length n in the range 55 ≤ n ≤ 64 from the CATH database, and the VAL60 ensemble of homopolypeptides, we plot the normalized histogram of Gc(i) for loops of length m in the intervals 20 ≤ m ≤ 24 (a), 30 ≤ m ≤ 34 (b), and 40 ≤ m ≤ 44 (c). (d) For natural protein domains and the VAL60 ensemble, root mean squared Gc(i) as a function of the loop length m.
Figure 6
Figure 6
Scatter plot of the enrichment score ΔEenr(a, b) vs normal contact potential Enorm(a, b). Each point is for an amino acid pair (a, b) and is colored according to amino acid types: black for pairs of aromatic residues (HIS, PHE, TRP, TYR); magenta for CYS-CYS; green for the rest. The dashed line is a linear fit with slope −0.12. Error bars are computed with a boostrapping procedure and we plot only errors for ΔEenr as those for Enorm are smaller than the symbol size.
Figure 7
Figure 7
(a) Normal contact potential Enorm; amino acids are ranked from left to right (top to bottom) with increasing average Enorm (over row/column). (b) Enrichment score ΔEenr for entangled contacts. Different backgrounds are used for highlighting negative and positive values: blue for E < −E0, light blue for −E0 ≤ E ≤ 0, pink for 0 < E ≤ E0, and red for E > E0 with E0 = 35. White is used for scores that differ from zero less than the corresponding statistical uncertainty, computed by means of a bootstrapping procedure.

References

    1. Baker D. A surprising simplicity to protein folding. Nature. 2000;405:39. doi: 10.1038/35011000. - DOI - PubMed
    1. Dokholyan NV, Li L, Ding F, Shakhnovich EI. Topological determinants of protein folding. Proceedings of the National Academy of Sciences. 2002;99:8637–8641. doi: 10.1073/pnas.122076099. - DOI - PMC - PubMed
    1. Dill K, MacCallum J. The protein-folding problem. 50 years on. Science. 2012;338:1042–1046. - PubMed
    1. Micheletti C, Banavar JR, Maritan A, Seno F. Protein structures and optimal folding from a geometrical variational principle. Physical Review Letters. 1999;82:3372. doi: 10.1103/PhysRevLett.82.3372. - DOI
    1. Muñoz V, Eaton WA. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proceedings of the National Academy of Sciences. 1999;96:11311–11316. doi: 10.1073/pnas.96.20.11311. - DOI - PMC - PubMed

Publication types