Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 16:16:1071.
doi: 10.1186/s12864-015-2280-z.

Protein aggregation, structural disorder and RNA-binding ability: a new approach for physico-chemical and gene ontology classification of multiple datasets

Affiliations

Protein aggregation, structural disorder and RNA-binding ability: a new approach for physico-chemical and gene ontology classification of multiple datasets

Petr Klus et al. BMC Genomics. .

Abstract

Background: Comparison between multiple protein datasets requires the choice of an appropriate reference system and a number of variables to describe their differences. Here we introduce an innovative approach to discriminate multiple protein datasets (multiCM) and to measure enrichments in gene ontology terms (cleverGO) using semantic similarities.

Results: We illustrate the powerfulness of our approach by investigating the links between RNA-binding ability and other protein features, such as structural disorder and aggregation, in S. cerevisiae, C. elegans, M. musculus and H. sapiens. Our results are in striking agreement with available experimental evidence and unravel features that are key to understand the mechanisms regulating cellular homeostasis.

Conclusions: In an intuitive way, multiCM and cleverGO provide accurate classifications of physico-chemical features and annotations of biological processes, molecular functions and cellular components, which is extremely useful for the discovery and characterization of new trends in protein datasets. The multiCM and cleverGO can be freely accessed on the Web at http://www.tartaglialab.com/cs_multi/submission and http://www.tartaglialab.com/GO_analyser/universal . Each of the pages contains links to the corresponding documentation and tutorial.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
RNA-binding abilities of S. cerevisiae chaperone substrates. a RNA-binding ability of yeast chaperones substrates is visualized in a microarray-like table. Hsp90 and Hsp40 are predicted to have the largest number of nucleic-acid binding partners (Positive set: vertical axis; Negative set: horizontal axis; Green: positive set is enriched with respect to negative set; Red: negative set is enriched with respect to positive set [3]; Yellow: non significant enrichment; Grey: not calculable enrichment due strong overlap between the sets). The enrichment is associated with a p-value < 10−5 calculated with Fisher’s exact test. b GO annotations are shown through an innovative interface that allows clustering through semantic similarity. The largest cluster of Hsp90 interactors is related to the molecular function (MF) RNA/DNA binding (red cluster corresponding to a coverage of 372 out of 877 proteins). Full analysis is available at http://www.tartaglialab.com/cs_multi/confirm/286/d67c93dd10/
Fig. 2
Fig. 2
Physico-chemical determinants of protein insolubility. Comparing low-solubility (LS) and high-solubility (HS) proteins in three eukaryotic cells [15], we found that a LS proteins are structurally disordered in human and mouse (red dots indicate enrichments in LS proteins).b The Boxplotter algorithm indicates that there is a significant difference between aggregation-propensities of HS and LS groups in yeast (p-value = 10−11; Mann–Whitney–Wilcoxon test; area under the ROC curve = 0.72), which is c inversely related to protein abundance (p-value = 10−9; Mann–Whitney–Wilcoxon test; area under the ROC curve = 0.70), in agreement with previous evolutionary observations [–32]. In all organisms, we find d more nucleic acid binding in LS fractions. e, f LS proteins are enriched in nucleic-acid binding ability (Additional file 1: Figure S1), as shown with cleverGO analysis on human and yeast. The links to multiCM, Boxplotter and cleverGO analyses are available at http://www.tartaglialab.com/cs_multi/confirm/737/6065feed14/
Fig. 3
Fig. 3
Protein aggregation and longevity. We used multiCM to analyze insoluble fractions of C. elegans proteins [16]. a Analysis of mass-spectrometry data indicates that in the hsf-1 strain (short-lived) highly enriched proteins (class HSF 4/4) are more aggregation prone than those less enriched (class HSF1 1/4). b In the daf-2 strain (long-lived), highly enriched proteins (DAF2 4/4) show lower aggregation propensities than the ones poorly enriched (DAF2 1/4). In these calculations, the insoluble fraction of the strains is divided into 4 equal sets containing proteins with fold enrichments > 1 with respect to wild type worm and ranked from low (1/4) to high (4/4)  [green dots indicate row vs column enrichments]. c Using the cleverGO algorithm, we analyzed proteins present in the hsf-1 strain (i.e., reported in HSF-1 4/4 and not in DAF-2 4/4) and found enrichments in metabolic pathways, oxidative stress response and mitochondrial function. Links to the analyses are at http://www.tartaglialab.com/cs_multi/confirm/757/9e1710f579/ and http://www.tartaglialab.com/cs_multi/confirm/758/95acfc44da/

References

    1. Vizcaíno JA, Côté RG, Csordas A, Dianes JA, Fabregat A, Foster JM, et al. The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucl Acids Res. 2013;41:D1063–9. doi: 10.1093/nar/gks1262. - DOI - PMC - PubMed
    1. Mi H, Dong Q, Muruganujan A, Gaudet P, Lewis S, Thomas PD. PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 2010;38(Database issue):D204–10. doi: 10.1093/nar/gkp1019. - DOI - PMC - PubMed
    1. Klus P, Bolognesi B, Agostini F, Marchese D, Zanzoni A, Tartaglia GG. The cleverSuite Approach for Protein Characterization: Predictions of Structural Properties, Solubility, Chaperone Requirements and RNA-Binding Abilities. Bioinformatics. 2014;30(11):1601–8. doi: 10.1093/bioinformatics/btu074. - DOI - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009;10:48. doi: 10.1186/1471-2105-10-48. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources