Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun;21(6):1114-1121.
doi: 10.1038/s41592-024-02241-6. Epub 2024 Apr 9.

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Affiliations

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Srinivas Niranj Chandrasekaran et al. Nat Methods. 2024 Jun.

Abstract

The identification of genetic and chemical perturbations with similar impacts on cell morphology can elucidate compounds' mechanisms of action or novel regulators of genetic pathways. Research on methods for identifying such similarities has lagged due to a lack of carefully designed and well-annotated image sets of cells treated with chemical and genetic perturbations. Here we create such a Resource dataset, CPJUMP1, in which each perturbed gene's product is a known target of at least two chemical compounds in the dataset. We systematically explore the directionality of correlations among perturbations that target the same protein encoded by a given gene, and we find that identifying matches between chemical and genetic perturbations is a challenging task. Our dataset and baseline analyses provide a benchmark for evaluating methods that measure perturbation similarities and impact, and more generally, learn effective representations of cellular state from microscopy images. Such advancements would accelerate the applications of image-based profiling of cellular states, such as uncovering drug mode of action or probing functional genomics.

PubMed Disclaimer

Conflict of interest statement

S.S. and A.E.C. serve as scientific advisors for companies that use image-based profiling and Cell Painting (A.E.C.: Recursion, SyzOnc, Quiver Bioscience; S.S.: Waypoint Bio, Dewpoint Therapeutics, Deepcell) and receive honoraria for occasional talks at pharmaceutical and biotechnology companies. D.K. and S.G. are employees of Merck Healthcare KGaA. All other authors have no competing interests.

Figures

Fig. 1
Fig. 1. Sample images from the dataset.
A five-channel image of human U2OS cells treated with the compound PFI-1 (a BRD4-specific inhibitor). This is a representative image from one of four wells of cells treated with PFI-1. The channel names indicate the cellular structures identified in each image (see Methods section for details; AGP, actin, Golgi, plasma membrane; DNA, nucleus; ER, endoplasmic reticulum; mito, mitochondria; RNA, nucleoli and cytoplasmic RNA). Other example images (including brightfield channels not shown here) are available at https://github.com/jump-cellpainting/2024_Chandrasekaran_NatureMethods/tree/6ba3fcd1495d9e844e4607373a568641981ffcd8/example_images. Scale bar, 100 µm.
Fig. 2
Fig. 2. UMAP representation of image-based profiles from a subset of the primary group of tested samples.
Profiles from human A549 cells perturbed by all compounds and CRISPR guides at the long time points (Supplementary Table 1) are shown here. The top four most-similar compound–genetic perturbation pairs, associated with the same gene, are highlighted. Source data
Fig. 3
Fig. 3. Schema of the data generated in the CPJUMP1 experiment.
Each rectangular block of replicates (REPLICATE X or REP X) is a 384-well plate of cells perturbed by compounds, CRISPR guides or ORFs and subjected to different experimental conditions. Short and long time points are described in Supplementary Table 1. Plates in the vertical blue and green boxes comprise the primary CPJUMP1 experiment. The other (that is, secondary) experimental conditions are described in the Methods section.
Fig. 4
Fig. 4. Benchmark results for progressively more difficult retrieval tasks.
a, Perturbation detection: retrieving replicates of the same sample. Mean average precision (mAP) for perturbation detection is shown across experimental conditions: cell type (columns) and time points (rows; short and long time points are defined in Supplementary Table 1). The numerical values shown above each box plot are the fraction of perturbations that can be successfully retrieved for each retrieval task. Box plot boundaries are 75th (Q3) and 25th (Q1) percentiles, with whiskers at ±1.5-fold the interquartile range (Q3–Q1) or the highest or lowest data point. The color of the bars denotes whether the query perturbation and the retrieved perturbation are in mostly the same well position (blue) or mostly different well positions (red); the latter is a more challenging task due to technical well-position artifacts. b, Perturbation matching, within a perturbation type: the plot shows mAP for sister perturbation retrieval (that is, pairs of compounds or pairs of CRISPR guides annotated with the same gene target). ORFs are not shown because there is only a single ORF reagent per gene. Absolute cosine similarity is used for calculating mAP values for compounds because pairs of compounds annotated to target the same protein can be positively or negatively correlated. c, Perturbation matching, across perturbation types: the plot shows mAP values for retrieving compound–gene pairs (that is, the same target and different perturbation type). Absolute cosine similarity is used for calculating mAP values for both compound–CRISPR and compound–ORF matching. The number of independent biological samples is available in Supplementary Table 2. Source data
Fig. 5
Fig. 5. Directionality of matching cross-modality.
a, Cosine similarity between ORF and CRISPR reagents targeting the same gene is shown for the two cell types, A549 (pink) and U2OS (blue). The 5th and 95th percentile thresholds of their respective nulls (ORFs and CRISPR reagents targeting different genes), along with the percentage of true pairs below the thresholds, are shown. We performed Fisher’s exact test to determine whether the true pairs beyond the threshold are more likely to be positively correlated or negatively correlated. We find them to be significantly more positively correlated (P values are available in Supplementary Table 3). b, Cosine similarity between compounds and the two genetic perturbation modalities, CRISPR (orange) and ORF (green) targeting the same gene or gene product. All analyses here were also statistically significantly more positively correlated; P values are available in Supplementary Table 4. Cosine similarity of zero is shown as a gray dashed line in both subplots. In a there were n = 3,728 biological independent ORF and CRISPR reagents and in b there were n = 1,864 independent pairs of compounds and genetic perturbations targeting the same gene or gene product. Box plot boundaries are 75th (Q3) and 25th (Q1) percentiles, with whiskers at ±1.5-fold the interquartile range (Q3–Q1) or the highest or lowest data point. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Well position effect.
mAP for perturbation detection for ORFs in the same well position (blue), same or different well positions (red) and different well positions (pink); the same or different well positions is what is shown in Fig. 4a in the main text. ORFs in different well positions are affected by plate layout effects, which lowers mAP and FR scores for retrieving replicates against a background of negative control wells. The numerical values shown above each box plot are the fraction of perturbations that can be successfully retrieved (FR) values for each retrieval task. Box plot boundaries are 75th (Q3) and 25th (Q1) percentiles, with whiskers at +/− 1.5 times the interquartile range (Q3–Q1) or the highest or lowest data point. n = 320 biologically independent ORF reagents in the blue boxes and n = 160 biologically independent ORF reagents in the pink and red boxes. Source data
Extended Data Fig. 2
Extended Data Fig. 2. Similarity of perturbation impact across modalities for genes and compounds related to the compound BI-2536.
Treatment of U2OS cells with BRD4 inhibitors BI-2536 (multi-target, including PLK1 and BRD4) and PFI-1 (BRD4-specific) is shown (top row; all images are composites of max intensity across all five imaging channels). BI-2536 causes cell death, and this phenotype is mimicked by PLK1 knockout (middle row). In contrast, BRD4 knockout fails to produce a distinct phenotype, death-related or otherwise, as is the case for the BRD4-specific inhibitor PFI-1 (middle column, top and middle row); both profiles are quantitatively similar to negative controls. This implies that BRD4 inhibition has a limited phenotypic impact in this assay under these experimental conditions, allowing the PLK1-inhibiting phenotype of BI-2536 to dominate the profile. BRD4 overexpression, on the other hand, also induces cell death (bottom row) and a profile strongly similar to BI-2536, which could indicate that BRD4 overexpression yields a dominant negative phenotype. Overexpression of PLK1 produces a phenotype that is not distinguishable from negative control. Negative controls for compounds, CRISPR, and ORF reagents are included. These are representative images from one of the four replicate wells of each treatment in the dataset. Wells were sampled from the longer time point for each perturbation modality (Supplementary Table 1).
Extended Data Fig. 3
Extended Data Fig. 3. CellProfiler features to cell images.
Example single-cell images (first row) and their synthetically generated version (second row) are shown in each sub figure. The synthetic version is generated by each single cell’s corresponding CellProfiler measurements. To learn a transformation function from single-cell’s CellProfiler extracted features to single-cell images, 7077 single cells were randomly selected from a set of eight diverse compounds (aloxistatin, AMG900, dexamethasone, TC-S-7004, FK-866, LY2109761, NVS-PAK1-1 and quinidine) to train a convolutional neural network (CNN). The set of cell-level Cell Painting measurements was reduced to a non-redundant set of features for five channels of DNA, RNA, ER, AGP and Mito. Location related features and low variance features were excluded. Single-cell images corresponding to each cell’s CellProfiler measurements were extracted by image crops of a fixed size (160 pixels) bounding box around the cell’s Cells_Location_Center coordinates. CNN model learns the transformation from (3019,1) size CellProfiler features to (128, 128, 5) size images.
Extended Data Fig. 4
Extended Data Fig. 4. Plate maps and overview of compounds’ clinical phase status.
Maps in a-c show a) Compound plate, b) CRISPR plate and c) ORF plate. The control wells and the treatment (trt) wells are shown in different colors. Poscon are positive controls (additional details in the Methods section) and negcon is the negative control. d) Over a third of the compounds in the dataset have been launched for sale, whereas others have progressed to various stages of human clinical trials.
Extended Data Fig. 5
Extended Data Fig. 5. Cell type and time, for all tested perturbations and conditions.
The primary group of tested samples in the CPJUMP1 experiment consists of three perturbation modalities (compounds, CRISPR guides and ORFs), two cell types (U2OS and A549) and two time points per perturbation modality (Supplementary Table 1). This UMAP plot includes the CPJUMP1 primary experiment (4 Compound, 4 CRISPR and 2 ORF plates per cell type and time point) plus all other data points from the CPJUMP1 experiments, as outlined in Fig. 3. Source data
Extended Data Fig. 6
Extended Data Fig. 6. Cas9 status.
Parental line is the original cell line with no modifications. Cas9 cell line is a polyclonal cell line expressing Cas9 (used for all CRISPR and one compound experiment). This UMAP plot includes the CPJUMP1 primary experiment (4 Compound, 4 CRISPR and 2 ORF plates per cell type and time point) plus all other data points from the CPJUMP1 experiments, as outlined in Fig. 3. Source data
Extended Data Fig. 7
Extended Data Fig. 7. Different cell seeding densities.
Experiments were performed with the baseline seeding density (100%; 1000 cells/well), increased seeding density (120%), and decreased seeding density (80%). This UMAP plot includes the CPJUMP1 primary experiment (4 Compound, 4 CRISPR and 2 ORF plates per cell type and time point) plus all other data points from the CPJUMP1 experiments, as outlined in Fig. 3. Source data
Extended Data Fig. 8
Extended Data Fig. 8. Impact of repeat imaging.
Some plates were imaged more than once. This UMAP plot includes the CPJUMP1 primary experiment (4 Compound, 4 CRISPR and 2 ORF plates per cell type and time point) plus all other data points from the CPJUMP1 experiments, as outlined in Fig. 3. Source data
Extended Data Fig. 9
Extended Data Fig. 9. Imaging after a time delay.
A subset of plates were imaged after a certain number of days. This UMAP plot includes the CPJUMP1 primary experiment (4 Compound, 4 CRISPR and 2 ORF plates per cell type and time point) plus all other data points from the CPJUMP1 experiments, as outlined in Fig. 3. Source data
Extended Data Fig. 10
Extended Data Fig. 10. Antibiotic selection.
In some CRISPR and ORF plates, cells were selected using antibiotics. This UMAP plot includes the CPJUMP1 primary experiment (4 Compound, 4 CRISPR and 2 ORF plates per cell type and time point) plus all other data points from the CPJUMP1 experiments, as outlined in Fig. 3. Source data

References

    1. Chandrasekaran SN, Ceulemans H, Boyd JD, Carpenter AE. Image-based profiling for drug discovery: due for a machine-learning upgrade? Nat. Rev. Drug Discov. 2021;20:145–159. doi: 10.1038/s41573-020-00117-w. - DOI - PMC - PubMed
    1. Caicedo JC, et al. Data-analysis strategies for image-based cell profiling. Nat. Methods. 2017;14:849–863. doi: 10.1038/nmeth.4397. - DOI - PMC - PubMed
    1. Pratapa A, Doron M, Caicedo JC. Image-based cell phenotyping with deep learning. Curr. Opin. Chem. Biol. 2021;65:9–17. doi: 10.1016/j.cbpa.2021.04.001. - DOI - PubMed
    1. Moshkov, N. et al. Learning representations for image-based profiling of perturbations. 10.1038/s41467-024-45999-1 (2022). - PMC - PubMed
    1. Canham SM, et al. Systematic chemogenetic library assembly. Cell Chem. Biol. 2020;27:1124–1129. doi: 10.1016/j.chembiol.2020.07.004. - DOI - PubMed

MeSH terms