Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 16;13(11):911-923.e9.
doi: 10.1016/j.cels.2022.10.001. Epub 2022 Oct 28.

Morphology and gene expression profiling provide complementary information for mapping cell state

Affiliations

Morphology and gene expression profiling provide complementary information for mapping cell state

Gregory P Way et al. Cell Syst. .

Abstract

Morphological and gene expression profiling can cost-effectively capture thousands of features in thousands of samples across perturbations by disease, mutation, or drug treatments, but it is unclear to what extent the two modalities capture overlapping versus complementary information. Here, using both the L1000 and Cell Painting assays to profile gene expression and cell morphology, respectively, we perturb human A549 lung cancer cells with 1,327 small molecules from the Drug Repurposing Hub across six doses, providing a data resource including dose-response data from both assays. The two assays capture both shared and complementary information for mapping cell state. Cell Painting profiles from compound perturbations are more reproducible and show more diversity but measure fewer distinct groups of features. Applying unsupervised and supervised methods to predict compound mechanisms of action (MOAs) and gene targets, we find that the two assays not only provide a partially shared but also a complementary view of drug mechanisms. Given the numerous applications of profiling in biology, our analyses provide guidance for planning experiments that profile cells for detecting distinct cell types, disease phenotypes, and response to chemical or genetic perturbations.

Keywords: Cell Painting; L1000; benchmark; drug discovery; high-dimensional profiling; images; systems biology.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Cell Painting and L1000 data provide complementary measurements of compound perturbations across doses.
(a) An example Cell Painting image of a single A549 lung cancer cell measured across five channels. We show a merged representation as well. ER = endoplasmic reticulum; Mito = mitochondria; AGP = actin, Golgi, plasma membrane. Scale bar is 20 μm. (b) The percent replicating metric measures the percentage of profiles that correlate with each other to a level higher than a carefully-matched and randomly-sampled null distribution. See methods for full details about sampling and data processing. The dotted blue line indicates the 95th percentile of the matched non-replicate distribution. (c) Median pairwise replicate Spearman correlations between profiles measured by the L1000 assay (y axis) and Cell Painting assay (x axis). The dotted black line is the line y = x, so anything above is measured with a higher replicate correlation in L1000 and vice versa. (d) The L1000 and Cell Painting assays reproducibly measure a complementary set of compound perturbations. The three numbers represent (from top to bottom) the number of compounds unique to L1000, the number of compounds captured in both assays, and the number of compounds unique to Cell Painting that have median pairwise replicate correlations above the randomized non-replicate correlation threshold.
Figure 2.
Figure 2.. Cell Painting captures a more diverse sample space than L1000.
(a) Uniform manifold approximation (UMAP) coordinates of all perturbations (level 4 replicates) across all doses by Cell Painting (left) and L1000 profiles (right). We highlight select MOAs that are consistently different from DMSO controls in either modality. Note that Cell Painting data is spherized and L1000 data is not, as explained in the main text; here this manifests in quite different patterns for the negative control DMSO samples. In particular, many of the otherwise-distinct islands of compounds for L1000 are populated by negative control DMSO. (b) Heatmaps of pairwise Pearson correlations of all measured compounds’ consensus signatures (see Methods) in either assay at the highest dose (10 μM) plus positive-control proteasome inhibitors at 20 μM and DMSO negative controls.
Figure 3.
Figure 3.. Cell Painting morphology features are more redundant than L1000 gene expression features.
(a) Heatmaps of pairwise Pearson correlations of 1,020 Cell Painting features and 974 L1000 features, in each case derived from feature-selected consensus signatures of the same compound treatments at 10μM. (b) The same data plotted as a density plot shows the distribution of correlations between pairs of L1000 or Cell Painting features. (c) The percentage of variance explained for the top 30 principal components derived from a Principal Component Analysis (PCA) in Cell Painting or L1000 readouts. (d) Comparing activity scores for highly reproducible compound perturbations (as defined by having 3 or more doses passing the percent strong threshold) reveals that most compounds induce a higher number of morphological changes than gene expression changes. (e) The mean MAS and TAS for compounds that are reproducible in at least three doses, with labels for compounds with the largest difference between MAS and TAS. (f) Overrepresentation analysis (ORA) for gene ontology (GO) terms using the genes most impacted by each individual compound treatment. We selected these compounds to include those that are reproducible in both L1000 and Cell Painting and that induce a high activity score in one assay, and a low activity score in the other. Each point is a GO term, comprising L1000 landmark genes that were consistently modulated by that compound.
Figure 4.
Figure 4.. Cell Painting and L1000 differentially measure compound perturbations by mechanism of action (MOA).
(a) Percent matching metrics for median pairwise replicate correlations of groups of compounds with a given MOA annotation, measured in both assays and across doses. The color of the point represents how many compounds were annotated to a given MOA class. (b) Median correlation between compounds annotated with the same MOA. We derived the null threshold through a nonparametric permutation test of randomly sampled compounds (see Methods). The size of the points represent how many compounds belong to the MOA class. (c) The L1000 and Cell Painting assays reproducibly measure a complementary set of MOAs. The three numbers represent (from top to bottom) the number of MOAs unique to L1000, the number of MOAs captured in both assays, and the number of MOAs unique to Cell Painting that have higher signal than a randomly permuted null distribution control. The All* bar represents matched MOAs for the 127 MOA set and the All bar represents matched MOAs for the 210 MOA set. Average precision of Cell Painting and L1000 compounds with different (d) MOA and (e) gene target annotations. We highlight certain high performing MOAs and targets.
Figure 5.
Figure 5.. Predicting compound mechanisms of action (MOA) in Cell Painting and L1000 reveals overlapping and complementary performance for different mechanisms.
(a) Deep learning workflow. We collected compound Cell Painting and L1000 data from compound perturbations and trained five different deep learning models to predict compound MOA and Gene Ontology terms. (b) Held out test set precision-recall curves for three well performing MOAs in both assays. (c) Individual MOA performance by held out test set area under the precision-recall curve (AUPR) in the top performing model using Cell Painting and L1000 data. (d) Overall held out test set model performance measured by AUPR for MOA prediction for our multi-label, multi-class prediction framework. We trained models from a recent Kaggle competition plus a K nearest neighbors baseline model. The dotted bar chart represents a negative control in which we trained models with shuffled labels. The solid lines indicate ensemble model performance by blending model predictions (see Methods). We trained all models using level 4 replicate profiles.

Similar articles

Cited by

References

    1. Alexa Adrian J.R. (2017). topGO (Bioconductor).
    1. Agarap AF (2018). Deep Learning using Rectified Linear Units (ReLU).
    1. Alexa A, Rahnenführer J, and Lengauer T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607. - PubMed
    1. Altman NS (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician 46, 175–185.
    1. Anaconda Inc. (2021). Anaconda software distribution.

Publication types

LinkOut - more resources