catGRANULE 2.0: accurate predictions of liquid-liquid phase separating proteins at single amino acid resolution

Michele Monti^#^{1

2}, Jonathan Fiorentino^#^{1

2}, Dimitrios Miltiadis-Vrachnos^{2

3}, Giorgio Bini^{2

4}, Tiziana Cotrufo⁵, Natalia Sanchez de Groot⁶, Alexandros Armaos^{1

2}, Gian Gaetano Tartaglia^{7

8}

Affiliations

¹ Center for Life Nano- & NeuroScience, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy.
² RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy.
³ Department of Biology and Biotechnologies, University of Rome Sapienza, Piazzale Aldo Moro 5, 00185, Rome, Italy.
⁴ Physics Department, University of Genoa, Via Dodecaneso 33, 16146, Genoa, Italy.
⁵ Departament de Biologia Cellular, Fisiologia i Immunologia, Universitat de Barcelona, Avenida Diagonal 643, 08028, Barcelona, Spain.
⁶ Department of Biochemistry and Molecular Biology, Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), 08193, Barcelona, Spain.
⁷ Center for Life Nano- & NeuroScience, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy. gian.tartaglia@iit.it.
⁸ RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy. gian.tartaglia@iit.it.

^# Contributed equally.

PMID: 39979996
PMCID: PMC11843755
DOI: 10.1186/s13059-025-03497-7

catGRANULE 2.0: accurate predictions of liquid-liquid phase separating proteins at single amino acid resolution

Michele Monti et al. Genome Biol. 2025.

. 2025 Feb 20;26(1):33.

doi: 10.1186/s13059-025-03497-7.

Authors

Affiliations

¹ Center for Life Nano- & NeuroScience, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy.
² RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy.
³ Department of Biology and Biotechnologies, University of Rome Sapienza, Piazzale Aldo Moro 5, 00185, Rome, Italy.
⁴ Physics Department, University of Genoa, Via Dodecaneso 33, 16146, Genoa, Italy.
⁵ Departament de Biologia Cellular, Fisiologia i Immunologia, Universitat de Barcelona, Avenida Diagonal 643, 08028, Barcelona, Spain.
⁶ Department of Biochemistry and Molecular Biology, Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), 08193, Barcelona, Spain.
⁷ Center for Life Nano- & NeuroScience, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy. gian.tartaglia@iit.it.
⁸ RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy. gian.tartaglia@iit.it.

^# Contributed equally.

PMID: 39979996
PMCID: PMC11843755
DOI: 10.1186/s13059-025-03497-7

Abstract

Liquid-liquid phase separation (LLPS) enables the formation of membraneless organelles, essential for cellular organization and implicated in diseases. We introduce catGRANULE 2.0 ROBOT, an algorithm integrating physicochemical properties and AlphaFold-derived structural features to predict LLPS at single-amino-acid resolution. The method achieves high performance and reliably evaluates mutation effects on LLPS propensity, providing detailed predictions of how specific mutations enhance or inhibit phase separation. Supported by experimental validations, including microscopy data, it predicts LLPS across diverse organisms and cellular compartments, offering valuable insights into LLPS mechanisms and mutational impacts. The tool is freely available at https://tools.tartaglialab.com/catgranule2 and https://doi.org/10.5281/zenodo.14205831 .

Keywords: Liquid-liquid phase separation; Machine learning; Mutations; Protein features; Subcellular compartmentalization.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
A Schematics of the catGRANULE 2.0 ROBOT workflow. A training dataset is constructed consisting of 3333 known human LLPS proteins and 3252 non-LLPS proteins. The proteins are then encoded in a set of 128 features, including sequence-based physico-chemical and Alphafold2-derived structural features. Next, a subset of relevant features is selected using ElasticNet and ten different classifiers are trained on the dataset; MLP is the selected classifier according to its superior performance on the test dataset. catGRANULE 2.0 ROBOT predictions are then validated on sets of known LLPS-prone proteins from different species [22] and on immunofluorescence microscopy images from the Human Protein Atlas. LLPS propensity profiles are predicted with a sliding window approach and validated on experimentally known LLPS driving regions of proteins belonging to different species, obtained from the PhaSepDB database [19]. Finally, catGRANULE 2.0 ROBOT predicts the effect of single and multiple amino acid mutations on LLPS propensity. B Venn diagram showing the overlap of LLPS-prone proteins collected from different databases. C Composition of the training dataset in terms of Panther protein class categories. Protein classes with less than 3% have been aggregated in the “Less represented label” category

**Fig. 2**
A Receiver-operating characteristic (ROC) curves obtained from the test dataset, for catGRANULE 2.0 ROBOT and other LLPS prediction algorithms (see Methods section for details). The area under the ROC curve (AUROC) for each algorithm is indicated in the legend. B Bar plot of the fraction of correctly predicted LLPS proteins for different species. The annotation of LLPS proteins was obtained from the DrLLPS database [22]. A star above a bar indicates a p value smaller than 0.05 from a Fisher’s exact test between the fraction of correctly predicted LLPS proteins in catGRANULE 2.0 ROBOT and in PICNIC. C Bar plot showing the Spearman’s correlation coefficient between the 28 features selected during the training step using ElasticNet and the predicted LLPS score, for the proteins belonging to the training dataset. The bar plot on the right shows the −log10(p value) of the correlation coefficient. D Box plot of the permutation importance computed on the training dataset for the 28 features selected during the training step using ElasticNet

**Fig. 3**
A AUROC versus the number of top and bottom proteins, ranked according to the predicted catGRANULE 2.0 ROBOT LLPS propensity score, for the average number of droplets (i.e., green puncta, left), area of the green puncta normalized by the average area of the nuclei (center) and coefficient of variation (CV) of the green intensity over the cell (right), computed from approximately 11k antibody-based images obtained by immunofluorescence (IF) confocal microscopy from the Human Protein Atlas (HPA). Line colors indicate the selection of proteins from different sub-cellular locations. See the Methods section and Additional File 6: Table S5. B Table showing the values of the quantities computed from the IF images for five example proteins, together with the LLPS propensity score predicted by catGRANULE 2.0 ROBOT and whether the protein was previously known to undergo LLPS. C IF images of the proteins reported in B. Note that the edge color matches those in B

**Fig. 4**
A Violin plot showing the predicted LLPS score for proteins belonging to different subcellular locations, obtained from Uniprot [65], sorted according to descending median LLPS propensity score. The number of proteins for each subcellular location is indicated above each violin. B Cluster map of the average permutation importance for each condensate. We show the condensates in the rows and we clustered them, we show the 28 features selected by our model ordered according to the descending permutation importance obtained from the full training dataset (see Fig. 2D). C Box plot showing the predicted LLPS propensity score for different classes of LLPS-prone proteins [66], sorted according to the median

**Fig. 5**
A AUROC vs number of top and bottom scores for the MLP classifier trained on structural and physico-chemical features (black) and a Random Forest classifier trained only on physico-chemical features (red). Dots and dashes indicate proteins from all organisms or only from human, respectively. B-C-E-F LLPS propensity profiles predicted by the Random Forest classifier trained only on physico-chemical features (black curve) and experimentally annotated LLPS driving regions (blue lines) obtained from the PhaSepDB database [19] for four proteins from different organisms. D Protein structure colored according to the predicted LLPS propensity profile

**Fig. 6**
A LLPS propensity score computed by catGRANULE 2.0 ROBOT, catGRANULE 1.0, and PSPHunter on the WT sequence of 9 proteins for which mutations affecting LLPS were collected. Colored dashed lines indicate the threshold to discriminate LLPS from non-LLPS proteins, with the color matching the algorithm. B Bar plot showing the fraction of correctly predicted mutation scores by catGRANULE 2.0 ROBOT, catGRANULE 1.0, and PSPHunter, for a set of 24 mutations including 20 mutations with a negative effect on LLPS and 4 mutations with a positive effect. C Distributions of the catGRANULE 2.0 ROBOT mutation score for mutations decreasing or increasing LLPS (red and black curves, respectively) from a mutational scanning of TDP-43 [37], at different thresholds on the experimental phase separation score. In the rightmost panel, the colored dashed lines show the predicted mutation score of two selected mutations. D AUROC computed on the catGRANULE 2.0 ROBOT mutation scores for the mutational scanning of TDP-43, as a function of the threshold on the experimental phase separation score. Increasing the threshold corresponds to selecting more restricted sets of mutations, with stronger positive and negative experimental effect on LLPS. The inset shows the LLPS propensity profiles predicted by catGRANULE 2.0 ROBOT for the WT sequence of TDP-43 and the two mutations shown in C. The red line indicates the experimental LLPS region

See this image and copyright information in PMC

References

1. Martin EW, Holehouse AS. Intrinsically disordered protein regions and phase separation: sequence determinants of assembly or lack thereof. Emerg Top Life Sci. 2020;4(3):307–29. - PubMed
1. Shapiro DM, Ney M, Eghtesadi SA, Chilkoti A. Protein phase separation arising from intrinsic disorder: first-principles to bespoke applications. J Phys Chem B. 2021;125(25):6740–59. - PubMed
1. Boeynaems S, Alberti S, Fawzi NL, Mittag T, Polymenidou M, Rousseau F, et al. Protein phase separation: a new phase in cell biology. Trends Cell Biol. 2018;28(6):420–35. - PMC - PubMed
1. Nandana V, Schrader JM. Roles of liquid-liquid phase separation in bacterial RNA metabolism. Curr Opin Microbiol. 2021;61:91–8. - PMC - PubMed
1. Nesterov SV, Ilyinsky NS, Uversky VN. Liquid-liquid phase separation as a common organizing principle of intracellular space and biomembranes providing dynamic adaptive responses. Biochim Biophys Acta (BBA) Mol Cell Res. 2021;1868(11):119102. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

catGRANULE 2.0: accurate predictions of liquid-liquid phase separating proteins at single amino acid resolution

Affiliations

catGRANULE 2.0: accurate predictions of liquid-liquid phase separating proteins at single amino acid resolution

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources