Morphology and gene expression profiling provide complementary information for mapping cell state

doi:10.1016/j.cels.2022.10.001

. 2022 Nov 16;13(11):911-923.e9.

doi: 10.1016/j.cels.2022.10.001. Epub 2022 Oct 28.

Morphology and gene expression profiling provide complementary information for mapping cell state

Gregory P Way¹, Ted Natoli², Adeniyi Adeboye³, Lev Litichevskiy², Andrew Yang², Xiaodong Lu², Juan C Caicedo³, Beth A Cimini³, Kyle Karhohs³, David J Logan³, Mohammad H Rohban³, Maria Kost-Alimova⁴, Kate Hartland⁴, Michael Bornholdt³, Srinivas Niranj Chandrasekaran³, Marzieh Haghighi³, Erin Weisbart³, Shantanu Singh³, Aravind Subramanian⁵, Anne E Carpenter⁶

Affiliations

¹ Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA.
² Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
³ Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
⁴ Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
⁵ Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address: asubramanian@dewpointx.com.
⁶ Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address: anne@broadinstitute.org.

PMID: 36395727
PMCID: PMC10246468
DOI: 10.1016/j.cels.2022.10.001

Morphology and gene expression profiling provide complementary information for mapping cell state

Gregory P Way et al. Cell Syst. 2022.

. 2022 Nov 16;13(11):911-923.e9.

doi: 10.1016/j.cels.2022.10.001. Epub 2022 Oct 28.

Authors

Affiliations

¹ Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA.
² Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
³ Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
⁴ Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
⁵ Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address: asubramanian@dewpointx.com.
⁶ Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address: anne@broadinstitute.org.

PMID: 36395727
PMCID: PMC10246468
DOI: 10.1016/j.cels.2022.10.001

Abstract

Morphological and gene expression profiling can cost-effectively capture thousands of features in thousands of samples across perturbations by disease, mutation, or drug treatments, but it is unclear to what extent the two modalities capture overlapping versus complementary information. Here, using both the L1000 and Cell Painting assays to profile gene expression and cell morphology, respectively, we perturb human A549 lung cancer cells with 1,327 small molecules from the Drug Repurposing Hub across six doses, providing a data resource including dose-response data from both assays. The two assays capture both shared and complementary information for mapping cell state. Cell Painting profiles from compound perturbations are more reproducible and show more diversity but measure fewer distinct groups of features. Applying unsupervised and supervised methods to predict compound mechanisms of action (MOAs) and gene targets, we find that the two assays not only provide a partially shared but also a complementary view of drug mechanisms. Given the numerous applications of profiling in biology, our analyses provide guidance for planning experiments that profile cells for detecting distinct cell types, disease phenotypes, and response to chemical or genetic perturbations.

Keywords: Cell Painting; L1000; benchmark; drug discovery; high-dimensional profiling; images; systems biology.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1.. Cell Painting and L1000 data provide complementary measurements of compound perturbations across doses.**
**(a)** An example Cell Painting image of a single A549 lung cancer cell measured across five channels. We show a merged representation as well. ER = endoplasmic reticulum; Mito = mitochondria; AGP = actin, Golgi, plasma membrane. Scale bar is 20 μm. **(b)** The percent replicating metric measures the percentage of profiles that correlate with each other to a level higher than a carefully-matched and randomly-sampled null distribution. See methods for full details about sampling and data processing. The dotted blue line indicates the 95th percentile of the matched non-replicate distribution. **(c)** Median pairwise replicate Spearman correlations between profiles measured by the L1000 assay (y axis) and Cell Painting assay (x axis). The dotted black line is the line y = x, so anything above is measured with a higher replicate correlation in L1000 and vice versa. **(d)** The L1000 and Cell Painting assays reproducibly measure a complementary set of compound perturbations. The three numbers represent (from top to bottom) the number of compounds unique to L1000, the number of compounds captured in both assays, and the number of compounds unique to Cell Painting that have median pairwise replicate correlations above the randomized non-replicate correlation threshold.

**Figure 2.. Cell Painting captures a more diverse sample space than L1000.**
**(a)** Uniform manifold approximation (UMAP) coordinates of all perturbations (level 4 replicates) across all doses by Cell Painting (left) and L1000 profiles (right). We highlight select MOAs that are consistently different from DMSO controls in either modality. Note that Cell Painting data is spherized and L1000 data is not, as explained in the main text; here this manifests in quite different patterns for the negative control DMSO samples. In particular, many of the otherwise-distinct islands of compounds for L1000 are populated by negative control DMSO. **(b)** Heatmaps of pairwise Pearson correlations of all measured compounds’ consensus signatures (see Methods) in either assay at the highest dose (10 μM) plus positive-control proteasome inhibitors at 20 μM and DMSO negative controls.

**Figure 3.. Cell Painting morphology features are more redundant than L1000 gene expression features.**
**(a)** Heatmaps of pairwise Pearson correlations of 1,020 Cell Painting features and 974 L1000 features, in each case derived from feature-selected consensus signatures of the same compound treatments at 10μM. **(b)** The same data plotted as a density plot shows the distribution of correlations between pairs of L1000 or Cell Painting features. **(c)** The percentage of variance explained for the top 30 principal components derived from a Principal Component Analysis (PCA) in Cell Painting or L1000 readouts. **(d)** Comparing activity scores for highly reproducible compound perturbations (as defined by having 3 or more doses passing the percent strong threshold) reveals that most compounds induce a higher number of morphological changes than gene expression changes. **(e)** The mean MAS and TAS for compounds that are reproducible in at least three doses, with labels for compounds with the largest difference between MAS and TAS. **(f)** Overrepresentation analysis (ORA) for gene ontology (GO) terms using the genes most impacted by each individual compound treatment. We selected these compounds to include those that are reproducible in both L1000 and Cell Painting and that induce a high activity score in one assay, and a low activity score in the other. Each point is a GO term, comprising L1000 landmark genes that were consistently modulated by that compound.

**Figure 4.. Cell Painting and L1000 differentially measure compound perturbations by mechanism of action (MOA).**
**(a)** Percent matching metrics for median pairwise replicate correlations of groups of compounds with a given MOA annotation, measured in both assays and across doses. The color of the point represents how many compounds were annotated to a given MOA class. **(b)** Median correlation between compounds annotated with the same MOA. We derived the null threshold through a nonparametric permutation test of randomly sampled compounds (see Methods). The size of the points represent how many compounds belong to the MOA class. **(c)** The L1000 and Cell Painting assays reproducibly measure a complementary set of MOAs. The three numbers represent (from top to bottom) the number of MOAs unique to L1000, the number of MOAs captured in both assays, and the number of MOAs unique to Cell Painting that have higher signal than a randomly permuted null distribution control. The All* bar represents matched MOAs for the 127 MOA set and the All bar represents matched MOAs for the 210 MOA set. Average precision of Cell Painting and L1000 compounds with different **(d)** MOA and **(e)** gene target annotations. We highlight certain high performing MOAs and targets.

**Figure 5.. Predicting compound mechanisms of action (MOA) in Cell Painting and L1000 reveals overlapping and complementary performance for different mechanisms.**
**(a)** Deep learning workflow. We collected compound Cell Painting and L1000 data from compound perturbations and trained five different deep learning models to predict compound MOA and Gene Ontology terms. **(b)** Held out test set precision-recall curves for three well performing MOAs in both assays. **(c)** Individual MOA performance by held out test set area under the precision-recall curve (AUPR) in the top performing model using Cell Painting and L1000 data. **(d)** Overall held out test set model performance measured by AUPR for MOA prediction for our multi-label, multi-class prediction framework. We trained models from a recent Kaggle competition plus a K nearest neighbors baseline model. The dotted bar chart represents a negative control in which we trained models with shuffled labels. The solid lines indicate ensemble model performance by blending model predictions (see Methods). We trained all models using level 4 replicate profiles.

See this image and copyright information in PMC

Cited by

High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbations.
Haghighi M, Caicedo JC, Cimini BA, Carpenter AE, Singh S. Haghighi M, et al. Nat Methods. 2022 Dec;19(12):1550-1557. doi: 10.1038/s41592-022-01667-0. Epub 2022 Nov 7. Nat Methods. 2022. PMID: 36344834 Free PMC article.
Application of Cell Painting for chemical hazard evaluation in support of screening-level chemical assessments.
Nyffeler J, Willis C, Harris FR, Foster MJ, Chambers B, Culbreth M, Brockway RE, Davidson-Fritz S, Dawson D, Shah I, Friedman KP, Chang D, Everett LJ, Wambaugh JF, Patlewicz G, Harrill JA. Nyffeler J, et al. Toxicol Appl Pharmacol. 2023 Jun 1;468:116513. doi: 10.1016/j.taap.2023.116513. Epub 2023 Apr 11. Toxicol Appl Pharmacol. 2023. PMID: 37044265 Free PMC article.
Linking autism risk genes to morphological and pharmaceutical screening by high-content imaging: Future directions and opinion.
Arta RK, Watanabe Y, Egawa J, Lemmon VP, Someya T. Arta RK, et al. Psychiatry Clin Neurosci. 2025 Aug;79(8):435-446. doi: 10.1111/pcn.13847. Epub 2025 Jun 10. Psychiatry Clin Neurosci. 2025. PMID: 40492449 Free PMC article. Review.
Cell Painting: a decade of discovery and innovation in cellular imaging.
Seal S, Trapotsi MA, Spjuth O, Singh S, Carreras-Puigvert J, Greene N, Bender A, Carpenter AE. Seal S, et al. Nat Methods. 2025 Feb;22(2):254-268. doi: 10.1038/s41592-024-02528-8. Epub 2024 Dec 5. Nat Methods. 2025. PMID: 39639168 Free PMC article. Review.
Evaluating batch correction methods for image-based cell profiling.
Arevalo J, Su E, Ewald JD, van Dijk R, Carpenter AE, Singh S. Arevalo J, et al. Nat Commun. 2024 Aug 2;15(1):6516. doi: 10.1038/s41467-024-50613-5. Nat Commun. 2024. PMID: 39095341 Free PMC article.

See all "Cited by" articles

References

1. Alexa Adrian J.R. (2017). topGO (Bioconductor).
1. Agarap AF (2018). Deep Learning using Rectified Linear Units (ReLU).
1. Alexa A, Rahnenführer J, and Lengauer T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607. - PubMed
1. Altman NS (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician 46, 175–185.
1. Anaconda Inc. (2021). Anaconda software distribution.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Alexa Adrian J.R. (2017). topGO (Bioconductor).

[2] Alexa Adrian J.R. (2017). topGO (Bioconductor).

[3] Agarap AF (2018). Deep Learning using Rectified Linear Units (ReLU).

[4] Agarap AF (2018). Deep Learning using Rectified Linear Units (ReLU).

[5] Alexa A, Rahnenführer J, and Lengauer T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607. - PubMed

[6] Alexa A, Rahnenführer J, and Lengauer T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607. - PubMed

[7] Altman NS (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician 46, 175–185.

[8] Altman NS (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician 46, 175–185.

[9] Anaconda Inc. (2021). Anaconda software distribution.

[10] Anaconda Inc. (2021). Anaconda software distribution.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Morphology and gene expression profiling provide complementary information for mapping cell state

Affiliations

Morphology and gene expression profiling provide complementary information for mapping cell state

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources