Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 12;122(14):3920-3930.
doi: 10.1021/acs.jpcb.8b01763. Epub 2018 Mar 29.

Accurately Predicting Disordered Regions of Proteins Using Rosetta ResidueDisorder Application

Affiliations

Accurately Predicting Disordered Regions of Proteins Using Rosetta ResidueDisorder Application

Stephanie S Kim et al. J Phys Chem B. .

Abstract

Although many proteins necessitate well-folded structures to properly instigate their biological functions, a large fraction of functioning proteins contain regions-known as intrinsically disordered protein regions-where stable structures are not likely to form. Notable functional roles of intrinsically disordered proteins are in transcriptional regulation, translation, and cellular signal transduction. Moreover, intrinsically disordered protein regions are highly abundant in many proteins associated with various human diseases, therefore these segments have become attractive drug targets for potential therapeutics. Over the past decades, numerous computational methods have been developed to accurately predict disordered regions of proteins. Here we introduce a user-friendly and reliable approach for the prediction of disordered protein regions using the structure prediction software Rosetta. Using 245 proteins from a benchmark data set (16 DisProt database proteins) and a test data set (229 proteins with NMR data), we use Rosetta to predict the global protein structures and then show that there is a statistically significant difference between Rosetta scores in disordered and ordered regions, with scores being less favorable in disordered regions. Furthermore, the difference in scores between ordered and disordered protein regions is sufficient to accurately identify disordered protein regions. As a result, our Rosetta ResidueDisorder method (benchmark data set prediction accuracy of 71.77% and independent test data set prediction accuracy of 65.37%) outperformed other established disorder prediction tools and did not exhibit a biased prediction toward either ordered or disordered regions. To facilitate usage, a Rosetta application has been developed for the Rosetta ResidueDisorder method.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Correlation between degree of disorder and size-normalized per-residue Rosetta score
A positive correlation is observed between the degree of disorder (fraction of disordered residues in each protein) of 16 benchmark dataset proteins and Rosetta score per residue of each protein. As the fraction of disordered residues increases, the size-normalized per-residue Rosetta score also increases (i.e. becomes less favorable).
Figure 2
Figure 2. Comparison of individual Rosetta order scores of disordered and ordered residues
This figure compares the distributions of the order score (defined as the window-averaged per-residue Rosetta scores) of all disordered and ordered residues in the 16 protein benchmark dataset. The two extreme points (maximum and minimum), mean, and the median are illustrated as tick marks.
Figure 3
Figure 3. Optimization of terminal residues prediction of the benchmark dataset
Rosetta ResidueDisorder disorder predictions of all 16 benchmark dataset proteins are shown. The blue data points are Rosetta order scores calculated at a window size of 11 residues. Residues with order scores above the cutoff line (red line) are predicted as disordered residues, while residues with order scores below the cutoff line are predicted as ordered residues. The cutoff line is slopped for terminal residues in proteins with less than 60% predicted disordered residues using a flat cutoff line at -1.0 REU. The cutoff values are increased for the terminal 13% of the protein sequence with a maximum cutoff of -0.3 REU.
Figure 4
Figure 4. Comparison of 6 prediction tools’ accuracy on 16 benchmark set proteins
The bar graph compares the average percent accuracy of 5 different IDP categories (0% disordered proteins (blue); 30% disordered proteins (yellow); 50% disordered proteins (green); 70% disordered proteins (red); 100% disordered proteins (purple)) for each 6 prediction tools. The error bar represents the standard deviation for each IDP category. IDP 50% bar graphs do not have error bars, because the IDP 50% category contained only one protein. A biased prediction accuracy can be observed toward long-length disordered regions for PONDR VL3-H and Meta-Disorder, and a biased prediction accuracy toward ordered regions for PrDOS, IUPred, DISOPRED, and MFDp2. Compared to the other tools, the Rosetta Residue Disorder method shows consistent prediction accuracy throughout all levels of disorder.
Figure 5
Figure 5. ROC curve analysis of the benchmark dataset
The ROC curves of 6 different prediction tools are shown: Rosetta ResidueDisorder (blue); IUPred (orange); PrDOS (green); PONDR VL3-H (red); MFDp2 (purple); Meta-Disorder (brown); DISOPRED (pink). AUCs are shown in the legend.
Figure 6
Figure 6. Comparison of 6 prediction tools’ average prediction accuracy of the test dataset
The bar graph compares the average percent accuracy of 5 different IDP categories (0% disordered proteins (blue); 30% disordered proteins (yellow); 50% disordered proteins (green); 70% disordered proteins (red); 100% disordered proteins (purple)) for each 6 prediction tools. The error bar represents the standard deviation for each IDP category. The bar graph clearly illustrates a biased prediction accuracy toward long-length disordered regions for PONDR VL3-H and Meta-Disorder, and a biased prediction accuracy toward ordered proteins for PrDOS, IUPred, DISOPRED, and MFDp2. Compared to the other tools, the Rosetta ResidueDisorder method shows consistent prediction accuracy throughout all levels of disorder.
Figure 7
Figure 7. ROC curve analysis of the test dataset
The ROC curves of 6 different prediction tools are shown: Rosetta ResidueDisorder (blue); IUPred (orange); PrDOS (green); PONDR VL3-H (red); MFDp2 (purple); Meta-Disorder (brown); DISOPRED (pink). AUCs are shown in the legend.

Similar articles

Cited by

References

    1. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, et al. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 2002;58(Pt 6 No 1):899–907. - PubMed
    1. Berman HM, Coimbatore Narayanan B, Di Costanzo L, Dutta S, Ghosh S, Hudson BP, Lawson CL, Peisach E, Prlić A, Rose PW, et al. Trendspotting in the Protein Data Bank. FEBS Lett. 2013;587(8):1036–1045. - PMC - PubMed
    1. Uversky VN. Introduction to intrinsically disordered proteins (IDPs) Chem Rev. 2014;114(13):6557–6560. - PubMed
    1. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones DT, et al. Classification of intrinsically disordered regions and proteins. Chem Rev. 2014;114(13):6589–6631. - PMC - PubMed
    1. Wright PE, Dyson HJ. Intrinsically disordered proteins in cellular signalling and regulation. Nat Rev Mol Cell Biol. 2015;16(1):18–29. - PMC - PubMed

Publication types

Substances