Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 16;16(6):13829-49.
doi: 10.3390/ijms160613829.

Identifying Similar Patterns of Structural Flexibility in Proteins by Disorder Prediction and Dynamic Programming

Affiliations

Identifying Similar Patterns of Structural Flexibility in Proteins by Disorder Prediction and Dynamic Programming

Aidan Petrovich et al. Int J Mol Sci. .

Abstract

Computational methods are prevailing in identifying protein intrinsic disorder. The results from predictors are often given as per-residue disorder scores. The scores describe the disorder propensity of amino acids of a protein and can be further represented as a disorder curve. Many proteins share similar patterns in their disorder curves. The similar patterns are often associated with similar functions and evolutionary origins. Therefore, finding and characterizing specific patterns of disorder curves provides a unique and attractive perspective of studying the function of intrinsically disordered proteins. In this study, we developed a new computational tool named IDalign using dynamic programming. This tool is able to identify similar patterns among disorder curves, as well as to present the distribution of intrinsic disorder in query proteins. The disorder-based information generated by IDalign is significantly different from the information retrieved from classical sequence alignments. This tool can also be used to infer functions of disordered regions and disordered proteins. The web server of IDalign is available at (http://labs.cas.usf.edu/bioinfo/service.html).

Keywords: disorder pattern; dynamic programming; dynamic time warping; intrinsic disorder; structural flexibility.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Abundance of proteins as a function of length and fraction of disordered residues (IDAA%) in both (A) Yeast and (B) DisProt datasets. All protein sequences longer than 1000 residues were merged with into the group of proteins of 1000 residues. The per-residue disorder score was calculated from PONDR-FIT. All residues of which the disorder score is higher than 0.5 were counted as disordered residues. The fraction of disordered residues is the ratio over the length of corresponding protein. Colors from purple, to blue, green, yellow, and red represents the increased abundance.
Figure 2
Figure 2
Influence of gap parameter on global matches for (A) Yeast dataset and (B) DisProt dataset. The solid line shows for normalized alignment score, while dash line and dotted lines represent normalized fraction of matches for two types of sequence pairs. Error bars present standard error. The first type of sequence pairs has lower similarity on their disorder curve and the calculated fraction of matches increases with gap penalty. The second type of sequence pairs is on the contrary. They have higher similarity on their disorder curves and their calculated fraction of matches decreases with the gap penalty.
Figure 3
Figure 3
Match parameter influences fraction of identified matches. The averaged fraction of matches on one pair of sequences was calculated and then normalized in the datasets. Solid line and dash line are the correlation between match parameter and fraction of matches for DisProt and Yeast datasets, respectively.
Figure 4
Figure 4
Identified alignment path for a sequence pair (Uniprot IDs: A0A023PXP4 and A0A023PZE6) from the Yeast dataset. (A) Traditional pair-wise sequence alignment. “*”, “:”, “.”, and “-” stand for identical amino acids, highly similar amino acids, similar amino acids, and gaps, respectively; (B) Original disorder predictions for A0A023PXP4 (upper panel) and A0A023PZE6 (lower panel). The gray shadow behind the disorder curves is estimated prediction error from PONDR-FIT; (C) Alignment path between two sequences identified by our newly developed package; (D) Alignment of disorder curves between A0A023PXP4 (pink) and A0A023PZE6 (black) along the alignment path. Many pairs of segments between pink and black curves overlap with each other. Only the pairs, of which the distance between two segments less than 0.05, were highlighted by cyan.
Figure 5
Figure 5
Identified alignment path for a sequence pair (Disprot IDs: DP00270 and DP00710) from the DisProt dataset. (A) Traditional pair-wise sequence alignment. “*”, “:”, “.”, and “-” stand for identical amino acids, highly similar amino acids, similar amino acids, and gaps, respectively; (B) Original disorder prediction for DP00270 (upper panel) and DP00710 (lower panel). The gray shadow behind the disorder curves is estimated prediction error; (C) Alignment path between two sequences identified by our newly developed package; (D) Alignment of disorder curves between DP00710 (pink) and DP00270 (black) along the alignment path. Only overlapped segment pairs of which the distance between two segments lower than 0.05 were highlighted by cyan.
Figure 6
Figure 6
Evaluation of the functional intrinsic disorder propensity of human p53 (UniProt ID: P04637) by D2P2 database (http://d2p2.pro/) [71]. In this plot, top two lines represent annotated disordered regions in the DisProt and IDEAL databases. Next nine colored bars represent location of disordered regions predicted by different disorder predictors (Espritz-D, Espritz-N, Espritz-X, IUPred-L, IUPred-S, PV2, PrDOS, PONDR® VSL2b, and PONDR® VLXT, see keys for the corresponding color codes). Green-and-white bar in the middle of the plot shows the predicted disorder agreement between these nine predictors, with green parts corresponding to disordered regions by consensus. Yellow bar shows the location of the predicted disorder-based binding site (MoRF region), whereas colored circles at the bottom of the plot show location of sites of various posttranslational modifications (red—phosphorylation, blue—methylation, yellow—acetylation; orange—glycosylation; and violet—ubiquitylation).
Figure 7
Figure 7
Identified alignment paths and alignments for sequence pairs between human p53 (Uniprot ID: P04637) and fish p53 (Uniprot ID: P79820) in (A,B), and between human p53 and fly p53 (Uniprot ID: Q9N6D8) in (C,D), respectively. (A,C) Alignment paths (contour maps) between two sequences in each of the sequence pairs were identified using our newly developed package; (B,D) Alignment of disorder curves along the alignment paths for two sequence pairs: P79820 (pink) and P04637 (black) in (B); Q9N6D8 (pink) and P04637 (black) in (D). Only overlapped segments of which the distance less than 0.05 were highlighted by cyan.
Figure 8
Figure 8
Identified alignment paths and alignments for sequence pairs between human p53 (Uniprot ID: P04637) and human p63 (Uniprot ID: Q9H3D4) in (A,B), and between human p63 and human p73 (Uniprot ID: O15350) in (C,D), respectively. (A,C) Alignment paths (contour maps) between two sequences in each of the sequence pairs were identified by our newly developed package; (B,D) Alignment of disorder curves along the alignment paths for two sequence pairs: Q9H3D4 (pink) and P04637 (black) in (B); O15350 (pink) and Q9H3D4 (black) in (D). Only overlapped segment pairs of which the distance less than 0.05 were highlighted by cyan.
Figure 9
Figure 9
Layout the IDalign web server.
Figure 10
Figure 10
Pseudo code of the algorithm. Dynamic programming was applied to search for the similarity between two disorder curves. When initiating the matrix, the penalty score P was assigned to the first column and first row of the matrix. Then the data points on the disorder curves were uploaded into the 2nd column and 2nd row of the matrix. Next, the distance and cost function were calculated using the formula described in the method section starting from the first vacant cell. After completing the calculation for all cells in the matrix, the alignment path was identified starting from the last cell to the first cell by connecting cells with lower cost function values.

References

    1. Monastyrskyy B., Kryshtafovych A., Moult J., Tramontano A., Fidelis K. Assessment of protein disorder region predictions in casp10. Proteins. 2014;82(Suppl. 2):127–137. doi: 10.1002/prot.24391. - DOI - PMC - PubMed
    1. Ali H., Urolagin S., Gurarslan O., Vihinen M. Performance of protein disorder prediction programs on amino acid substitutions. Hum. Mutat. 2014;35:794–804. doi: 10.1002/humu.22564. - DOI - PubMed
    1. Punta M., Simon I., Dosztanyi Z. Prediction and analysis of intrinsically disordered proteins. Methods Mol. Biol. 2015;1261:35–59. - PubMed
    1. Wright P.E., Dyson H.J. Intrinsically unstructured proteins: Re-assessing the protein structure-function paradigm. J. Mol. Biol. 1999;293:321–331. doi: 10.1006/jmbi.1999.3110. - DOI - PubMed
    1. Dunker A.K., Lawson J.D., Brown C.J., Williams R.M., Romero P., Oh J.S., Oldfield C.J., Campen A.M., Ratliff C.M., Hipps K.W., et al. Intrinsically disordered protein. J. Mol. Graph. Model. 2001;19:26–59. doi: 10.1016/S1093-3263(00)00138-8. - DOI - PubMed

Publication types

MeSH terms

Substances