Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 1;30(11):1601-8.
doi: 10.1093/bioinformatics/btu074. Epub 2014 Feb 3.

The cleverSuite approach for protein characterization: predictions of structural properties, solubility, chaperone requirements and RNA-binding abilities

Affiliations

The cleverSuite approach for protein characterization: predictions of structural properties, solubility, chaperone requirements and RNA-binding abilities

Petr Klus et al. Bioinformatics. .

Abstract

Motivation: The recent shift towards high-throughput screening is posing new challenges for the interpretation of experimental results. Here we propose the cleverSuite approach for large-scale characterization of protein groups.

Description: The central part of the cleverSuite is the cleverMachine (CM), an algorithm that performs statistics on protein sequences by comparing their physico-chemical propensities. The second element is called cleverClassifier and builds on top of the models generated by the CM to allow classification of new datasets.

Results: We applied the cleverSuite to predict secondary structure properties, solubility, chaperone requirements and RNA-binding abilities. Using cross-validation and independent datasets, the cleverSuite reproduces experimental findings with great accuracy and provides models that can be used for future investigations.

Availability: The intuitive interface for dataset exploration, analysis and prediction is available at http://s.tartaglialab.com/clever_suite.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The cleverSuite algorithm. The CM estimates the ability of physico-chemical properties to discriminate two input datasets. The statistical analysis gives information about individual property coverages and strength with respect to randomized sets. An exhaustive property-combination search is performed to assess the significance of the datasets separation. The CC uses the models generated by CM to classify new datasets to either the positive or negative set. Individual physico-chemical profiles are reported along with the discrimination statistics
Fig. 2.
Fig. 2.
Grouped property view. Example of properties grouped by class assignment and color (each property is described by 10 predictors). The E.coli solubility analysis is used as illustrative case: soluble proteins (positive case) are more disordered and less hydrophobic/aggregation prone. Low-significance properties (Z-score < Zth; P > 0.01; Section 2) are devoid of color. In the webserver, this view is interactive and shows information about each scale after clicking (see also Supplementary Fig. S1)
Fig. 3.
Fig. 3.
Correlation between coverage and AUC. For the five cases presented in this study, AUC and coverage of individual physico-chemical properties show a correlation r > 0.85. In this example, we use human RNA-binding proteins (compared with lysate; r = 0.95)
Fig. 4.
Fig. 4.
Scale combinations and statistics. (A) Relationship between the number of combined scales and the coverages for both positive (blue bars) and negative (green bars) datasets. (B) Statistics for each scale combination and its individual members. In the webserver, click-through the combination titles reveals scales contained and detailed statistics (three-scale combination is shown; the E.coli solubility analysis is used as example). This view is used to summarize results of both CM and CC

References

    1. Agostini F, et al. Sequence-based prediction of protein solubility. J. Mol. Biol. 2012;421:237–241. - PubMed
    1. Alberti S, et al. A systematic survey identifies prions and illuminates sequence features of prionogenic proteins. Cell. 2009;137:146–158. - PMC - PubMed
    1. Andreeva A, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008;36:D419–D425. - PMC - PubMed
    1. Argos P, et al. Structural prediction of membrane-bound proteins. Eur. J. Biochem. 1982;128:565–575. - PubMed
    1. Babu MM, et al. Intrinsically disordered proteins: regulation and disease. Curr. Opin. Struct. Biol. 2011;21:432–440. - PubMed

Publication types

LinkOut - more resources