Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 12;26(1):367.
doi: 10.1186/s12864-025-11534-9.

Deep learning tools predict variants in disordered regions with lower sensitivity

Affiliations

Deep learning tools predict variants in disordered regions with lower sensitivity

Federica Luppino et al. BMC Genomics. .

Abstract

Background: The recent AI breakthrough of AlphaFold2 has revolutionized 3D protein structural modeling, proving crucial for protein design and variant effects prediction. However, intrinsically disordered regions-known for their lack of well-defined structure and lower sequence conservation-often yield low-confidence models. The latest Variant Effect Predictor (VEP), AlphaMissense, leverages AlphaFold2 models, achieving over 90% sensitivity and specificity in predicting variant effects. However, the effectiveness of tools for variants in disordered regions, which account for 30% of the human proteome, remains unclear.

Results: In this study, we found that predicting pathogenicity for variants in disordered regions is less accurate than in ordered regions, particularly for mutations at the first N-Methionine site. Investigations into the efficacy of variant effect predictors on intrinsically disordered regions (IDRs) indicated that mutations in IDRs are predicted with lower sensitivity and the gap between sensitivity and specificity is largest in disordered regions, especially for AlphaMissense and VARITY.

Conclusions: The prevalence of IDRs within the human proteome, coupled with the increasing repertoire of biological functions they are known to perform, necessitated an investigation into the efficacy of state-of-the-art VEPs on such regions. This analysis revealed their consistently reduced sensitivity and differing prediction performance profile to ordered regions, indicating that new IDR-specific features and paradigms are needed to accurately classify disease mutations within those regions.

Keywords: AlphaMissense; Benchmarking; Intrinsically disordered regions; Methionine start site; Variant effect predictors.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: N/A. Consent for publication: N/A. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Definitions of disordered regions based on sequence conservation, computational tools and structural information largely agree. a) Multiple sequence alignment (MSA) of the heat-shock beta-1 protein of HSPB1 gene. The MSA was obtained with the EVcouplings [40] server (https://v2.evcouplings.org/) using a bit score of 0.7. The rectangle bar over the MSA shows the domain organization according to InterPro [50] and the work of Jehle et al. [64]. The alpha-crystalline domain presents a conserved alignment, namely each homologous sequence (the rows of the MSA) has the same or similar residue as the heat-shock beta-1 protein as indicated by the colored vertical lines. Conversely, the N- and C-termini of the protein present a poor alignment, that is only few residues are conserved in the homologous sequences and the rest of them vary or are missing (shown as gaps). b) Disorder scores according to AIUpred, AlphaFold2 (AF2) pLDDT, metapredict, AlphaFold2-RSA and flDPnn (shown on the y-axis) for each residue position (shown on the x-axis). The grey areas highlight the disordered regions according to the specific thresholds for each tool: AIUPred and metapredict 0.5, AlphaFold2 pLDDT 70, AlphaFold2-RSA 0.581 and flDPnn 0.3. There is strong agreement on the disorder composition of the protein’s C-terminal domain, while the N-terminus is predicted to be disordered by AF2 pLDDT, metapredict and AlphaFold2-RSA but not by AIUPred and flDPnn. For instance, the alpha-crystalline domain is predicted mainly as disordered by AIUPred and flDPnn despite having a high quality x-ray crystal structure (PDB: 3Q9Q). c) The AlphaFold2 model of the heat-shock beta-1 protein is colored according to the domain architecture. The two termini mainly constitute disordered regions/coils (grey) while the alpha-crystalline domain (orange) is composed of beta-sheets
Fig. 2
Fig. 2
Only 7% of residues are consistently predicted as disordered across all tools. The barplot shows the frequency (%) of the combinations of disorder and order predictions among the five computational tools. Only the combinations with at least 1% frequency are shown out of the 32 possible combinations of ordered and disordered class). For example, the combination with the highest frequency (54%) is where all tools predict ordered residues
Fig. 3
Fig. 3
ClinVar variants that occur in disordered regions are predominantly benign. The columns correspond to the disorder definitions according to the different tools: AIUPred, AlphaFold2 pLDDT, metapredict, AlphaFold2-RSA and flDPnn. The rows show various descriptive statistical measures of the disorder and ClinVar variants. a) The first row shows the proportion of ClinVar variants in ordered and disordered regions, which confirms previous works reporting 30% of the human proteome as disordered [2, 3]. The exception is flDPnn, which predicts only 9% of residues as disordered. b) The second row shows the proportion of benign and pathogenic variants in ordered and disordered regions respectively. It highlights that more than 80% of variants in disordered regions are benign in ClinVar. c) The last row shows the distribution of disordered and ordered residues for pathogenic and benign variants confirming the frequently reported range of 10–15% of disease-causing mutations occurring in disordered regions [5, 6]
Fig. 4
Fig. 4
Pathogenicity enrichment in IDRs depends on the disorder prediction tool. Variants phenotypic effect (pathogenic and benign) on the x-axis according to IDR groups (C-terminal, N-terminal, between domains IDRs and Intrinsically Disordered Proteins (IDP)) and across different definitions of IDRs (metapredict, flDPnn and AIUPred). On the y-axis the marginal proportion of ClinVar variants for the corresponding category. For example, according to metapredict 34% of pathogenic variants occur in N-terminal IDRs
Fig. 5
Fig. 5
Mutations at the N-Methionine site are prevalent and misclassified by VEPs. a) The histogram shows the frequency of ClinVar variants (n = 61878) per residue position for 7459 proteins. The inset histogram highlights protein residues 1-100 and shows the number of pathogenic (red) and benign (blue) variants at each position. The prevalence of mutations at the N-Methionine site stands out with more than 500 variants associated with this site. b) The distribution of distances from the first Methionine to the second one for proteins with mutations at the N-Methionine site, grouped by phenotypic effect, namely pathogenic (n = 480) and benign (n = 39). In the inset, the same data is represented with a boxplot and the significance of the difference of the two distributions is annotated on the top (Wilcoxon rank sum test with one side, greater as alternative hypothesis). On the right, the same pair plots are shown for proteins without mutations at the N-Methionine site (pathogenic, n = 22754 and benign, n = 38605). The distance to the second Methionine discriminates between pathogenic and benign variants associated with the N-Methionine site and not with mutations at other sites. c) Sensitivity and specificity for VEPs on predicting the effect of mutations occurring at the N-Methionine site (n = 356 variants, including 331 pathogenic and 25 benign with prediction available across all VEPs considered in this study). AlphaMissense overpredicts benign variants and reaches only 29% sensitivity, VARITY overpredicts pathogenic variants reaching only 20% specificity, and REVEL is the most balanced with 65% sensitivity and 84% specificity
Fig. 6
Fig. 6
State-of-the-art VEPs tools have lower sensitivity but higher specificity for variants in disordered regions. a) Performance in terms of sensitivity and specificity (y-axis) of VEPs (x-axis) calculated on ClinVar variants according to disordered/ordereded class as predicted by AIUPred, AlphaFold2 pLDDT, metapredict, AlphaFold2-RSA and flDPnn. The violin plots show the performance of VEPs calculated on 200 bootstrap samples of 12530 variants each sampled with replacement and in equal proportion from the benign and pathogenic class and the disorder class (see Methods). The horizontal line in the violin shows the median value. To ease the comparison, a black horizontal line is set at 90% both for the sensitivity and specificity. The comparison with more VEPs is included in Supplementary Fig. 6 and in panel b. The best performing VEPs, namely AlphaMissense, VARITY, REVEL, are associated with higher specificity for variants in disordered regions and higher sensitivity in ordered regions. Interestingly, when disorder is measured with AIUPred the sensitivity of AlphaMissense does not vary between the disorder and the order groups. b) The difference between the median value of the distribution of specificity and sensitivity of panel a is plotted (y-axis). Values greater than 0 indicate that the specificity is higher than the sensitivity, and a value of 0.1 means that the specificity is 10% higher than the sensitivity. While values smaller than 0 indicate higher sensitivity. Traditional VEPs such as PolyPhen2 are still biased towards higher sensitivity, even for variants in disordered regions, whereas the performance of state-of-the-art VEPs, like AlphaMissense, is driven by specificity in disordered regions and sensitivity in ordered ones
Fig. 7
Fig. 7
Disorder-to-order transitions predicted by AlphaFold3 may aid structure-based variant interpretation. a) The secondary structure annotation according to STRIDE for the AlphaFold3 3D model of the heat-shock beta-1 protein (HSPB1, Uniprot ID: P04792) in its monomeric (top), dimeric (middle) and tetrameric form (bottom). The disorder-to-order transition of the N-terminus is supported by the decrease of predicted coil elements in the dimeric and tetrameric organization of the protein with respect to its monomeric form and the corresponding increase of alpha-helices. In panels b), c) and d) the AlphaFold3 models of P04792 monomer, dimer and tetramer are shown respectively. Colors correspond to the legend in a) and only one chain is highlighted for the dimer and tetramer. The C-terminus is not displayed in its entirety as indicated by the dashed black lines of panel b) and c). The position with the variant P39L is highlighted in black. The disorder-to-order transition as predicted by STRIDE corresponds to the new alpha-helices of the dimer and tetramer visible in panel c) and d). e) Motif prediction of HSPB1 (P04792) using SHARK-capture. Conserved motifs of the disordered regions are colored by their SHARK-capture score, benign and pathogenic variants are marked. Pathogenic mutations fall into SHARK-capture motifs in the N- and C-terminal regions while benign mutations do not. The P39L pathogenic variant is located within a conserved PEEWS motif

Similar articles

Cited by

References

    1. Chow CFW, Ghosh S, Hadarovich A, Toth-Petroczy A. SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences. Proc Natl Acad Sci U S A. 2024;121:e2401622121. - PMC - PubMed
    1. Pentony MM, Jones DT. Modularity of intrinsic disorder in the human proteome: disorder in the human proteome. Proteins. 2010;78:212–21. - PubMed
    1. Ruff KM, Pappu RV. AlphaFold and implications for intrinsically disordered proteins. J Mol Biol. 2021;433:167208. - PubMed
    1. Vacic V, Iakoucheva LM. Disease mutations in disordered regions - Exception to the rule? Mol Biosyst. 2012;8:27–32. - PMC - PubMed
    1. Tesei G, Trolle AI, Jonsson N, Betz J, Knudsen FE, Pesce F, et al. Conformational ensembles of the human intrinsically disordered proteome. Nature. 2024. 10.1038/s41586-023-07004-5. - PubMed

LinkOut - more resources