Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 22;167(1):158-170.e12.
doi: 10.1016/j.cell.2016.09.010.

Structured States of Disordered Proteins from Genomic Sequences

Affiliations

Structured States of Disordered Proteins from Genomic Sequences

Agnes Toth-Petroczy et al. Cell. .

Abstract

Protein flexibility ranges from simple hinge movements to functional disorder. Around half of all human proteins contain apparently disordered regions with little 3D or functional information, and many of these proteins are associated with disease. Building on the evolutionary couplings approach previously successful in predicting 3D states of ordered proteins and RNA, we developed a method to predict the potential for ordered states for all apparently disordered proteins with sufficiently rich evolutionary information. The approach is highly accurate (79%) for residue interactions as tested in more than 60 known disordered regions captured in a bound or specific condition. Assessing the potential for structure of more than 1,000 apparently disordered regions of human proteins reveals a continuum of structural order with at least 50% with clear propensity for three- or two-dimensional states. Co-evolutionary constraints reveal hitherto unseen structures of functional importance in apparently disordered proteins.

Keywords: EVfold; Evolutionary couplings; bioinformatics; computational biology; conformational flexibility; disorder; maximum entropy; statistical physics; structure prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Co-evolutionary analysis of disordered segments in the human proteome
First we identify contiguous regions of disorder, secondly we search for similar sequences and select robust alignments; thirdly we calculate evolutionary couplings for each alignment using an updated algorithm to compute significant long range ECs and secondary structure propensity from short-range ECs. Finally we assess these predictions to reveal the likelihood of secondary and tertiary structure (Experimental Procedures).
Figure 2
Figure 2. Experimentally determined states of flexible proteins are captured by evolutionary couplings
(A) Overall performance predicting experimental contacts for a set of 83 flexible and disordered proteins with known structures for significant long-range ECs (left panel) and precision of the secondary structure propensity scores on a per residue basis for a set of over 3800 PFAM families and our validation set of 83 flexible and disordered proteins with known structures (right panel). For residues with a propensity score suggesting both α-helix and β-strand we took these residues to be α-helical given stronger evolutionary constraint; red-dashed lines show the decreased precision including these calls in β-strand as well. (B) Peptidyl carrier protein (PCP) undergoes large conformational changes, including the repacking of its helices, upon cofactor binding (left: apo form, 2gdy; right: holo form, 2gdx). ECs reflect interactions between helix1 and helix2 (magenta circle, only in apo 3D structure) as well as helix1 and helix4 (blue circle, only in holo 3D structure). Many residue-residue distances change substantially between the two conformations. For example, there is strong coupling between residues K18 and E58, which form a salt bridge in the apo form, while they are >20 Å apart in the holo form. Our secondary structure propensity score predicts all 4 helices of PCP, the third being present only in the intermediate state between the apo and holo form (2gdw). (C) ECs agree with a known conformational switch in the chloride ion channel protein (CLIC1) undergoes redox condition dependent conformational switch, including α-helix to β-strand transitions.
Figure 3
Figure 3. Evolutionary couplings predict close residues in known ordered states of disordered proteins
(A) ECs (pink circles) perfectly recapitulate the experimental contacts (grey circles- residue-residue distance <5 Å) of the folded, DNA-bound state of Lef-1 that is partially unstructured in the absence of DNA (2lef, Precision=1.00). (B) ECs predict the overall contact map of Calnexin chaperone, including the disordered luminal domain, which only folds when binding unfolded glycoprotein (1jhn, Precision=0.58). (C) phoA has been captured experimentally in the folded state (1aja) and unfolded state when bound to a chaperone (2mlz) (left panel). ECs capture contacts that are unique to the folded state (pink circles) and some unique to the unfolded state (blue circles) (middle panel). Specifically two pairs of ECs predict residue pairs that are only close in the “unfolded” state (between 416D-423S, and 406P-411A, ~16 and ~13 Å apart in the folded state and 3.8 and 2 Å apart in the unfolded state) (right panel).
Figure 4
Figure 4. EVFold predictions of novel states of disordered proteins
(A) Protein phosphatase 1 inhibitor 2 (IPP2_MOUSE), a disordered regulator, binds to its complex partner via three anchoring regions (residues12–17, 44–56, and 148–151), while the rest of the molecule remains invisible in the crystal structure (blue shading). ECs predict the existence of the helical anchors, as well as the long-range interactions between these regions. (B) Predicted contact map of p27 (CDN1B_HUMAN) reveals alternative states that are not compatible with the bound structure, and might form when free or bound to another partner. (C) Rev protein of HIV (REV_HV1H3) was captured experimentally in a dimeric state that is corroborated by the EC map, which has contacts for the helix-helix packing and the dimer contacts of the experimental structure (3lph). Additionally, possible multimer contacts may explain the higher order oligomerization of Rev. The C-terminal, invisible in experiments thus far, has signal for helical structure and has long-range evolutionary constraints indicative of a folded state (predicted 3D model, C, right).
Figure 5
Figure 5. Accurate prediction of structure without long-range contacts
(A) The experimental 3D structure of PSMD4 in complex with di-ubiquitin (2kdf) has no long range contacts between the helices ensuring he separation of the two ubiquitin interacting motifs (UIMs) (top). Consistent with this, there are no ECs between residues distance in chain but nevertheless local ECs identify the helices formed when bound to ubiquitin as well as a weaker signal for possible β-strands. (B) ECs (pink circles) match the known contacts (grey circles) in the structure of the N terminal end of the histone H1.1 (1ghc, (P08287)) but do not predict long-in-chain contacts in the C terminal tail, consistent with observations that the histone tail is flexible in vivo. Secondary structure prediction of C terminal region suggest β-strands.
Figure 6
Figure 6. Human proteome wide prediction of structural states of disordered proteins
(A) Out of the 4543 disordered segments analyzed, 21% (965) of these had alignments with sufficent sequences that also met our convergence criteria. 381 (40%) of these segments have long-range ECs giving a globularity score >0.1), another 52% have predicted 2D constraints (secondary structure propensity score >0.1) but very few 3D constraints, and the remaining 8% show almost no signal for any structural constraints. Almost 10% also have EC patterns suggestive of repeats. (B) Distribution of long-range predicted contacts (left) and the propensity to secondary structure (right) across the proteins. (C) Four examples of proteins with high proportion of long-range ECs (yellow) that have no known structure and are considered disordered. Secondary structure predictions (yellow along axes) correspond well to tertiary structure packing indicated by the long range ECs. (D) Four examples of proteins without evidence of a 3D contacts, but that do have with predicted secondary structure elements. All of our predictions and data files are available on the web at marks.hms.harvard.edu/disorder/.

References

    1. Alexander PA, He Y, Chen Y, Orban J, Bryan PN. A minimal sequence code for switching protein structure and function. Proc Natl Acad Sci U S A. 2009;106:21149–21154. - PMC - PubMed
    1. Bah A, Vernon RM, Siddiqui Z, Krzeminski M, Muhandiram R, Zhao C, Sonenberg N, Kay LE, Forman-Kay JD. Folding of an intrinsically disordered protein by phosphorylation as a regulatory switch. Nature. 2015;519:106–109. - PubMed
    1. Baker JM, Hudson RP, Kanelis V, Choy WY, Thibodeau PH, Thomas PJ, Forman-Kay JD. CFTR regulatory region interacts with NBD1 predominantly via multiple transient helices. Nat Struct Mol Biol. 2007;14:738–745. - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Brown CJ, Johnson AK, Daughdrill GW. Comparing models of evolution for ordered and disordered proteins. Mol Biol Evol. 2010;27:609–621. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources