Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 14;119(24):e2115369119.
doi: 10.1073/pnas.2115369119. Epub 2022 Jun 10.

Screening membraneless organelle participants with machine-learning models that integrate multimodal features

Affiliations

Screening membraneless organelle participants with machine-learning models that integrate multimodal features

Zhaoming Chen et al. Proc Natl Acad Sci U S A. .

Abstract

Protein self-assembly is one of the formation mechanisms of biomolecular condensates. However, most phase-separating systems (PS) demand multiple partners in biological conditions. In this study, we divided PS proteins into two groups according to the mechanism by which they undergo PS: PS-Self proteins can self-assemble spontaneously to form droplets, while PS-Part proteins interact with partners to undergo PS. Analysis of the amino acid composition revealed differences in the sequence pattern between the two protein groups. Existing PS predictors, when evaluated on two test protein sets, preferentially predicted self-assembling proteins. Thus, a comprehensive predictor is required. Herein, we propose that properties other than sequence composition can provide crucial information in screening PS proteins. By incorporating phosphorylation frequencies and immunofluorescence image-based droplet-forming propensity with other PS-related features, we built two independent machine-learning models to separately predict the two protein categories. Results of independent testing suggested the superiority of integrating multimodal features. We performed experimental verification on the top-scored proteins DHX9, Ki-67, and NIFK. Their PS behavior in vitro revealed the effectiveness of our models in PS prediction. Further validation on the proteome of membraneless organelles confirmed the ability of our models to identify PS-Part proteins. We implemented a web server named PhaSePred (http://predict.phasep.pro/) that incorporates our two models together with representative PS predictors. PhaSePred displays proteome-level quantiles of different features, thus profiling PS propensity and providing crucial information for identification of candidate proteins.

Keywords: metapredictor; partner-dependent; phase separation; phosphorylation; self-assembly.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
PS-Self and PS-Part proteins possess different amino acid patterns. (A) Species distribution of 592 nonredundant proteins that collected from PhaSepDB. (B) PS proteins collected from PhaSepDB, LLPSDB, and PhaSePro are divided into five nonoverlapping sets. (C) Amino acid frequency fold-changes of the SaPS set and the PdPS set are calculated against the NoPS set. Amino acids are ranked by their propensity to form IDRs. (D) To measure the performance of four representative PS predictors in screening self-assembling proteins, we plotted the ROC curve for each predictor by scoring proteins in the SaPS and NoPS sets. (E) ROC curves of four predictors are plotted for the PdPS and NoPS sets. The AUCs show the poor performance of these tools in screening partner-dependent proteins.
Fig. 2.
Fig. 2.
Constructing self-assembling and partner-dependent protein predictors with PS-related features. (A) Comparison of 10 PS-related features between the two PS protein sets and the non-PS set. P value is calculated through the two-sided Mann–Whitney U test (*P < 0.05; **P < 0.01; ***P < 0.001). (B) The hNoPS set is downsampled according to the IDR distribution in the hSaPS set. The Phos frequency of the hSaPS set is still significantly higher than that of the sampled hNoPS set. (C) The Phos frequency of the hPdPS set is significantly higher than that of the sampled hNoPS set with similar IDR distribution. (D) Schematic view of the SaPS and PdPS models. (E) Evaluating model performance using the independent test sets of self-assembling (hSaPS-test, Left), partner-dependent (hPdPS-test, Center), and PS protein sets (hPS-test, Right).
Fig. 3.
Fig. 3.
Comparing the SaPS and PdPS models with another four representative PS predictors. (A) Comparison of six PS predictors on four datasets of proteins that participate in MLOs and two datasets of proteins located in membrane-bound organelles. AUC values are calculated by, respectively, using proteins in these datasets as positive samples and proteins in the human NoPS-test set as negative samples. (B) Comparison of 6 PS predictors on a BioID interactome with 20 intracellular locations. The value and the color of each block corresponds to the AUC value, which is calculated by, respectively, using proteins in these datasets as positive samples and proteins in the human NoPS-test set as negative samples. (C) The averaged SHAP values of the SaPS and PdPS models are calculated on the OpenCell nuclear punctae set. Phos frequency has the highest weight among the 10 incorporated features. (D) Overlap of top-scored proteins from six PS predictors. Only 12 of the 2,667 collected proteins are predicted as PS proteins by all predictors. The red arrows indicate the location of candidate proteins DHX9, Ki-67, and NIFK.
Fig. 4.
Fig. 4.
Experimental validation of DHX9 isoform2, Ki-67 truncation, and NIFK. (A) Schematic diagram of in vitro PS assay to illustrate the PS capacity of GFP or mCherry fused proteins after MBP removal. N-terminal MBP tags of MBP–GFP–DHX9 Isoform2, MBP–GFP–Ki-67 truncation and MBP–mCherry–NIFK were cleaved before droplet assembly with TEV protease overnight. Further droplet assembly for these proteins was performed on 384-well confocal plate. (B) Phase diagrams with blow-up images of GFP–DHX9 Isoform2 (Left). Quantitative results for FRAP analyses of the average recovery traces of GFP–DHX9 Isoform2 (Right). (C) Phase diagrams with blow-up images of GFP–Ki-67 truncation with DNA (Left). Quantitative results for FRAP analyses of the average recovery traces of GFP–Ki-67 truncation (Right). (D) Phase diagrams of GFP–Ki-67 truncation with DNA and mCherry–NIFK (Left). Quantitative results for FRAP analyses of the average recovery traces of GFP–Ki-67 truncation and mCherry–NIFK (Right).
Fig. 5.
Fig. 5.
Functional analysis of self-assembling and partner-dependent candidates in the human proteome. (A) Single-sample GSEA of SaPS, PdPS, and another four representative PS predictors in the human proteome. Thirty-seven representative pathways are shown. If a method enriches any of the 37 pathways, the corresponding block would be colored according to its NES. (B) GSEA plot of Hippo pathway according to the scores of SaPS in the human proteome (Left). Schematic view of the Hippo pathway, in which the core components are shown as an ellipse, and the other regulators are shown as a rectangle. All components are colored according to their SaPS score (Right). (C) Clustering of 1,609 proteins with PdPS score greater than 0.8 into five sets according to the similarity of embedded protein sequences. The distance between clusters is measured by Ward’s minimum variance method. (D) Enriched domains in the five clustered sets of PdPS candidates. Nineteen representative domains are shown. If a cluster enriches any of the 19 domains, the corresponding block would be colored by −log10 P value.

Similar articles

Cited by

References

    1. Alberti S., Gladfelter A., Mittag T., Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates. Cell 176, 419–434 (2019). - PMC - PubMed
    1. Zhang H., et al. , Liquid-liquid phase separation in biology: Mechanisms, physiological functions and human diseases. Sci. China Life Sci. 63, 953–985 (2020). - PubMed
    1. Li P., et al. , Phase transitions in the assembly of multivalent signalling proteins. Nature 483, 336–340 (2012). - PMC - PubMed
    1. Nott T. J., et al. , Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell 57, 936–947 (2015). - PMC - PubMed
    1. Su X., et al. , Phase separation of signaling molecules promotes T cell receptor signal transduction. Science 352, 595–599 (2016). - PMC - PubMed

Publication types

LinkOut - more resources