Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 8;16(1):505.
doi: 10.1038/s41467-024-55707-8.

Predicting cell morphological responses to perturbations using generative modeling

Affiliations

Predicting cell morphological responses to perturbations using generative modeling

Alessandro Palma et al. Nat Commun. .

Abstract

Advancements in high-throughput screenings enable the exploration of rich phenotypic readouts through high-content microscopy, expediting the development of phenotype-based drug discovery. However, analyzing large and complex high-content imaging screenings remains challenging due to incomplete sampling of perturbations and the presence of technical variations between experiments. To tackle these shortcomings, we present IMage Perturbation Autoencoder (IMPA), a generative style-transfer model predicting morphological changes of perturbations across genetic and chemical interventions. We show that IMPA accurately captures morphological and population-level changes of both seen and unseen perturbations on breast cancer and osteosarcoma cells. Additionally, IMPA accounts for batch effects and can model perturbations across various sources of technical variation, further enhancing its robustness in diverse experimental conditions. With the increasing availability of large-scale high-content imaging screens generated by academic and industrial consortia, we envision that IMPA will facilitate the analysis of microscopy data and enable efficient experimental design via in-silico perturbation prediction.

PubMed Disclaimer

Conflict of interest statement

Competing interests: A.P. declares no competing interests. M.L. owns interests in Relation Therapeutics and is a scientific cofounder and part-time employee at AIVIVO. F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd., and Omniscope and has an ownership interest in Dermagnostix GmbH and Cellarity.

Figures

Fig. 1
Fig. 1. IMPA enables perturbation effect prediction via style transfer.
a Perturbation prediction with IMPA. A control cell image xi is encoded into a content representation while a dense embedding of the target perturbation is collected and concatenated with a random vector. A lower dimensional projection of the concatenation constitutes the style space which conditions every layer of the decoder via the AdaIN method. With hij we indicate the output of the jth decoder layer on the image i. The transformed output leads a discriminator net to predict that the decoded image is a real example of the target perturbation. Moreover, a style encoder is trained to replicate the style vector from the transformed image. The scale bar is 20 μm. b Examples of use cases of the IMPA model: Prediction of morphological effects derived from applying a perturbation to cell images, correcting for technical variations by transporting images to a single experimental batch, learning a style space for perturbation where proximal perturbations are responsible for triggering a similar effect. The scale bar is 20 μm. c 2D UMAP plots of 356 CellProfiler features before and after transformation with IMPA for Vincristine and Cytochalasin B. Data points represent individual control, transformed control and real perturbation images in the test set of a five-drug subset of BBBC021 (N = 20,313). d Violin plots showing the distribution of discriminative CellProfiler features between controls (N = 520), IMPA’s predictions (N = 520), and original perturbation images for Vincristine and Cytochalasin B (N = 173 for Cytochalasin B and N = 354 for Vincristine). The boxes within the violin plots show the median, top and bottom quartiles of the feature distributions, while the whiskers mark the 95% quantiles. Real perturbed and control images are drawn from the test set. Source data are provided as Source data files. e Visual comparison of IMPA with existing models on the style transfer task. The scale bar is 20 μm. f Evaluation metrics comparing generated images with real perturbed images, averaged across different drugs on the BBBC021 dataset. Data are presented as mean values ± 95% confidence intervals. Source data are provided as Source data files.
Fig. 2
Fig. 2. IMPA predicts population response to perturbations and unseen drug effects on the whole BBBC021 dataset (N = 118,799).
a Large field of view prediction of the perturbation response to Taxol, Simvastatin, Nocodazole and Cytochalasin B performed by IMPA. The compounds have distinguishable effects both on morphology and cell density. The scale bar is 30 μm. b The distribution of the number of cells and total cell area before and after IMPA’s transformation computed for 10 drugs and compared with real perturbed cells. The boxplots show the median, top and bottom quartiles of the considered features across settings. The whiskers in the boxplots mark the 95% quantiles. Source data are provided as Source data files. c The Tanimoto similarity of each compound and its closest perturbation in BBBC021. Compounds highlighted in bold were held out from training via scaffold-based splitting. For brevity, only 35 drugs out of the 99 in the dataset are shown in the plot. Source data are provided as Source data files. d Model performance on held-out compounds in terms of FID as a function of their Tanimoto similarity to the closest training drug. Source data are provided as Source data files. e Comparison between IMPA and Mol2Image on the unseen drug's effect prediction tasks. For all measurements, the higher the value, the better the generated output approximates the expected phenotype. Metrics are averaged across the 10 unseen drugs. MoA prediction accuracy is evaluated only on the 4 visually annotated drugs (AZ258, Colchicine, Taxol and Cytochalasin B) in the unseen group. Data are presented as mean value ± 95% confidence intervals. Source data are provided as Source data files. f 2D PC plot of the perturbation space learned by the style encoder. Perturbations highlighted in bold are part of the set of held-out compounds. Groups are highlighted as drugs triggering a similar phenotypic effect in the original dataset. Examples of predictions by IMPA on the unseen perturbations in the groups are provided together with images displaying the real phenotype. The scale bar is 30 μm.
Fig. 3
Fig. 3. IMPA corrects technical batch effects via style transfer on RxRx1 (N = 170,942).
a Given a dataset of images collected from multiple batches, a style embedding is learnt for each batch and used to transport all images into the same batch. The scale bar is 20 μm. b A classifier is trained to distinguish cells from different batches. Before correction, cells should be assigned to their original batch. After correction, the classifier is deceived into labeling all the cells with a single batch. The dot plot represents the fraction of cells assigned to each batch by the classifier before and after correction by transforming all images to batch 0. Source data are provided as Source data files. The Scale bar is 20 μm. c Top - PCA plots before and after correction colored by batch labels. The features are extracted with a pre-trained Cell Painting Vision Transformer (ViT). Bottom - mean batch impurity scores are measured as entropy and Gini index computed for each cluster of images. A higher value of batch impurity suggests a better mixing of batch labels within a cluster. Clusters are derived using the Leiden algorithm. Source data are provided as Source data files. d Highlighted cell images before and after correction by IMPA colored by batch for controls and treated with siRNAs targeting A4GALT and TTN genes. e Metrics comparing batch correction results between IMPA and a competing model evaluated on the ViT features extracted from corrected images. Source data are provided as Source data files. f Visual examples of transformations from batch 1 to batch 0 across models. The scale bar is 20 μm.
Fig. 4
Fig. 4. IMPA predicts the effect of multiple perturbation types on cpg0000 (N = 435,160).
a An overview of the types and number of perturbations in the JUMP-cpg0000 dataset. We consider 435,160 images of treated U2OS cells. b The first step before predicting perturbation responses is to use IMPA to remove the plate effect. Scale bar is 30 μm. c Illustration of the importance of removing plate effect with IMPA for perturbation prediction. Above, are the predictions of the effect of BVT-948 without performing plate correction. Below, are the results after plate correction. On the right, is an example of an expected phenotype. Scale bar is 30 μm. d Depiction of the different perturbation embeddings as input to IMPA’s perturbation encoder to learn a shared perturbation space. e PCA plot of the perturbation embedding computed by IMPA colored by perturbation type. Highlighted are two different couples of drugs and ORF perturbations proximal in the perturbation space that target the same genes. Scale bar is 30 μm. f Morphological predictions of three unseen drugs and CRISPR perturbations computed by IMPA. At the centre, the control image is used as an input for the prediction. For each perturbation, four generated outputs are computed by drawing four random codes during inference. On the sides are two examples of real cells from the chosen perturbation. Scale bar is 30 μm.

References

    1. Bickle, M. The beautiful cell: high-content screening in drug discovery. Anal. Bioanal. Chem.398, 219–226 (2010). - DOI - PubMed
    1. Boutros, M., Heigwer, F. & Laufer, C. Microscopy-based high-content screening. Cell163, 1314–1325 (2015). - DOI - PubMed
    1. Lin, S., Schorpp, K., Rothenaigner, I. & Hadian, K. Image-based high-content screening in drug discovery. Drug Discov. Today25, 1348–1361 (2020). - DOI - PubMed
    1. Moffat, J. G., Vincent, F., Lee, J. A., Eder, J. & Prunotto, M. Opportunities and challenges in phenotypic drug discovery: an industry perspective. Nat. Rev. Drug Discov.16, 531–543 (2017). - DOI - PubMed
    1. Zhou, Y. et al. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells. Nature509, 487–491 (2014). - DOI - PubMed

LinkOut - more resources