Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 31;120(44):e2304302120.
doi: 10.1073/pnas.2304302120. Epub 2023 Oct 25.

Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2

Affiliations

Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2

T Reid Alderson et al. Proc Natl Acad Sci U S A. .

Abstract

The AlphaFold Protein Structure Database contains predicted structures for millions of proteins. For the majority of human proteins that contain intrinsically disordered regions (IDRs), which do not adopt a stable structure, it is generally assumed that these regions have low AlphaFold2 confidence scores that reflect low-confidence structural predictions. Here, we show that AlphaFold2 assigns confident structures to nearly 15% of human IDRs. By comparison to experimental NMR data for a subset of IDRs that are known to conditionally fold (i.e., upon binding or under other specific conditions), we find that AlphaFold2 often predicts the structure of the conditionally folded state. Based on databases of IDRs that are known to conditionally fold, we estimate that AlphaFold2 can identify conditionally folding IDRs at a precision as high as 88% at a 10% false positive rate, which is remarkable considering that conditionally folded IDR structures were minimally represented in its training data. We find that human disease mutations are nearly fivefold enriched in conditionally folded IDRs over IDRs in general and that up to 80% of IDRs in prokaryotes are predicted to conditionally fold, compared to less than 20% of eukaryotic IDRs. These results indicate that a large majority of IDRs in the proteomes of human and other eukaryotes function in the absence of conditional folding, but the regions that do acquire folds are more sensitive to mutations. We emphasize that the AlphaFold2 predictions do not reveal functionally relevant structural plasticity within IDRs and cannot offer realistic ensemble representations of conditionally folded IDRs.

Keywords: AlphaFold2; NMR spectroscopy; conditional folding; intrinsically disordered proteins; structural biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Predicted IDRs in the human proteome that have confident structures in the AFDB. (A) Histogram of per-residue pLDDT scores in the human proteome (black) compared with the predicted disordered (orange) or ordered (blue) regions. The Inset shows an expansion of the predicted disordered regions between pLDDT scores of 70 to 100. The cumulative percentage of predicted disordered residues with scores greater than or equal to 70 and 90 are indicated in the lower right. (B) Flowchart outlining the analysis presented in (A). (C) Stacked bar graph showing the percentage of residues in the human proteome (black) that have very low (<50; dotted lines), low (50 ≤ x < 70; horizontal lines), confident (≤ 70 x < 90; empty), and very confident (≤90; filled) pLDDT scores. The corresponding plots are included for SPOT-Disorder-predicted disordered residues (orange) and ordered residues (blue). (D) Example structures in the AFDB for SPOT-Disorder-predicted IDRs, with the percentage of predicted disordered residues of the total listed. The AFDB structures have been color-coded by pLDDT scores as indicated (E) DSSP-determined secondary structure content of the predicted disordered (orange) and ordered (blue) regions as a function of pLDDT thresholds.
Fig. 2.
Fig. 2.
Examples of three IDRs with high pLDDT scores that conditionally fold and have been characterized by NMR spectroscopy. (A) AlphaFold2-predicted structures of three IDPs/IDRs that are color coded by pLDDT scores. From left to right: α-synuclein, 4E-BP2, and ACTR. The N and C termini of each protein are indicated. (B) Sequence-based prediction of disorder for the three IDPs/IDRs. Four different programs were used: IUPred2A, DISOPRED3, metapredict, and SPOT-Disorder. Only SPOT-Disorder correctly predicts the disordered nature of all three IDPs/IDRs. (C) Per-residue pLDDT confidence scores derived from the AlphaFold2 structures. (D) NMR 13Cα and 13Cβ chemical shift-derived secondary structure propensity (SSP) (67). For the AlphaFold2 structures (blue) and the 4E-BP2 peptide bound to eIF4E (PDB ID: 3am7), NMR chemical shifts were back-calculated from the structure using SPARTA+ software (68). The unbound/unmodified IDPs/IDRs (orange) show very little preferential secondary structure (α-synuclein) or modest populations of helix (4E-BP2, ACTR). By contrast, the binding to SDS micelles (α-synuclein), phosphorylation (4E-BP2), binding to eIF4E (4E-BP2), or the binding to CBP (ACTR) induces the formation of stable secondary structure (purple) that is in better agreement with the AlphaFold2 structures.
Fig. 3.
Fig. 3.
Structures of IDRs in the AFDB correlate with experimentally determined structures of the IDRs bound to interaction partners. (A–D) experimental structures for the listed IDRs/IDPs (red) bound to an interacting folded domain (grey surface representation). The PDB ID codes are 1jsu, 1kil, 1p4q, and 1l8c respectively. (EH) The predicted structures in the AFDB for the listed IDRs/IDPs in panels AD. These structures have been color-coded by pLDDT scores, with blue, purple, and orange respectively corresponding to very confident (≥90), confident (70 to 90), and low (<70) scores. (IL) comparison of the experimental structures from panels AD with the predicted structures in the AFDB from panels EH. Experimental structures are colored red and AFDB structures blue. The heavy-atom RMSD upon alignment of secondary structure elements is indicated.
Fig. 4.
Fig. 4.
Bioinformatics analysis of predicted IDRs in the AFDB with high pLDDT scores. (A) Amino-acid percentages in the regions of predicted order and disorder, with the disordered regions further separated into those with confident pLDDT scores greater than or equal to 70 (IDRhigh pLDDT) and those below 50 (IDRlow pLDDT). Shown here is the percent change in the relative amino-acid percentages for IDRlow pLDDT and either ordered regions (ΔOrder, empty blue bars) or IDRhigh pLDDTIDR, orange bars). Positive values indicate that a given amino acid is fractionally enriched in IDRlow pLDDT whereas negative values indicate depletion. (B) The difference between ΔOrder and ΔIDR high reports on the relative difference in amino-acid usage between ordered regions and IDRhigh pLDDT regions as compared to IDRlow pLDDT regions. Positive values reflect an increased usage of a given amino acid in IDRhigh pLDDT regions whereas negative values reflect enrichment in ordered regions, as compared to IDRlow pLDDT regions. (C) BLAST results from querying amino-acid sequences in the PDB (Methods) for predicted IDRs in the AFDB that are longer than 10 residues. Percentage of predicted IDRs (hits/total) that were identified in the PDB as a function of the E-value and the pLDDT score, with <50 in orange, ≥70 in cyan, and ≥90 in blue. Box plots of the number of aligned sequences (D), average alignment depth (E), and average positional conservation (F). Panel D is not significant (ns) whereas panels E and F have P-values (Mann-Whitney) < 0.0001 when comparing pLDDT < 50 and the other groups (***).
Fig. 5.
Fig. 5.
Systematic identification of conditionally folded IDRs in archaea, bacteria, and eukaryotes. (A) ROC curve for AlphaFold2 pLDDT-based classification of conditionally folded IDRs based on databases of known examples (MFIB, FuzDB, DisProt, MoRF, DIBS). The AlphaFold2-performance on the binary classification task (conditional folder/non-conditional folder) is displayed, with TPR (FPR) corresponding to true (false) positive rate. All five databases were merged (All, black) The AUC is 0.76 for the combined dataset. The black dot on each curve represents the threshold at which the TPR-to-FPR ratio is largest. (B) Correlation between the mean positional amino-acid conservation of IDR sequences from the databases listed in panel A and the AUC values from the ROC curves in panel A. The best-fit line is shown in black and has a Pearson’s R2 of 0.78. Note that gaps in the sequence alignments were ignored for the calculation of positional conservation. (C) IDRs with pLDDT scores greater than or equal to 70 for continuous regions of 10 or 30 or more amino acids are shown in blue and white with blue lines, respectively. For comparison, the number of conditionally folded IDRs in DisProt (black) and the PDB (white) are shown. (D) For each species listed, the percentage of disordered residues in the proteome (predicted by IUPred2A) is shown in orange on the left y axis. The percentage of predicted disordered residues with pLDDT scores ≥ 70 (i.e., conditionally folded IDRs) is shown in blue on the right y axis. (E) The percentage of predicted disordered residues in the proteome of each organism from panel D plotted against the predicted percentage of residues in predicted IDRs with pLDDT scores greater than or equal to 70, conditionally folded (CF) IDRs.
Fig. 6.
Fig. 6.
Using AlphaFold2 to understand the basis of disease-causing mutations in conditionally folded IDRs. (A) The per-residue mutational burden (the number of mutations divided by the total number of residues) for IDRs is shown as a function of AlphaFold2 pLDDT scores (<50, ≥70, ≥90). Disease-associated mutations from OMIM are shown in solid bars on the left, and presumably non-pathogenic mutations that are present in the general human population (1000GP) are shown in empty bars on the right. *** indicates a P value < 0.0001 from a Fisher Exact Test. (B) The L168V mutation in ALX3 causes frontonasal dysplasia, but the mutation is predicted to be likely benign by CADD and REVEL. The high-confidence AlphaFold2 model shows that L168V creates a large cavity (orange spheres). In combination with FoldX, the AlphaFold2 model yields the prediction that L168V is severely destabilizing, with a ΔΔG value of 7.3 ± 0.1 kcal mol−1 relative to wild-type ALX3 (ΔΔG = ΔGWT–ΔGmutant). Hydrophobic interactions between the L168 side chain (blue) and other atoms in the ALX3 homeodomain that are within a 4.5-Å distance threshold are shown (green lines). A total of 26 interactions were identified. The residues involved in these interactions are indicated. In silico mutagenesis of L168 to Val was performed by FoldX. Hydrophobic interactions involving V168 and other atoms in the ALX3 homeodomain are indicated. Only 14 interactions are identified. Arpeggio was used for this analysis (92).

References

    1. Anfinsen C. B., Principles that govern the folding of protein chains. Science 181, 223–230 (1973). - PubMed
    1. Baker D., Sali A., Protein structure prediction and structural genomics. Science 294, 93–96 (2001). - PubMed
    1. Jumper J., et al. , Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). - PMC - PubMed
    1. Baek M., et al. , Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021). - PMC - PubMed
    1. AlQuraishi M., Machine learning in protein structure prediction. Curr. Opin. Chem. Biol. 65, 1–8 (2021). - PubMed

Publication types

Substances

LinkOut - more resources