This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Aug 7:2023.08.06.552164.

doi: 10.1101/2023.08.06.552164.

A genome-wide atlas of human cell morphology

Meraj Ramezani^{1

2}, Julia Bauman^{1

3}, Avtar Singh^{1

4}, Erin Weisbart¹, John Yong⁵, Maria Lozada^{1

2}, Gregory P Way^{1

6}, Sanam L Kavari^{1

7}, Celeste Diaz^{1

3}, Marzieh Haghighi¹, Thiago M Batista^{2

8}, Joaquín Pérez-Schindler^{2

8}, Melina Claussnitzer^{2

8

9}, Shantanu Singh¹, Beth A Cimini¹, Paul C Blainey^{1

10

11}, Anne E Carpenter¹, Calvin H Jan⁵, James T Neal^{1

2

8}

Affiliations

¹ Broad Institute of MIT & Harvard, Cambridge, MA, USA.
² Type 2 Diabetes Systems Genomics Initiative of the Broad Institute of MIT and Harvard, Cambridge, MA, USA.
³ Current address: Stanford University, Stanford, CA, USA.
⁴ Current address: Genentech Department of Cellular and Tissue Genomics, South San Francisco, CA, USA.
⁵ Calico Life Sciences LLC, South San Francisco, CA, USA.
⁶ Current address: Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, USA.
⁷ Current address: University of Pennsylvania, Philadelphia, PA, USA.
⁸ The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA, USA.
⁹ Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
¹⁰ MIT Department of Biological Engineering, Cambridge, MA, USA.
¹¹ Koch Institute for Integrative Research at MIT, Cambridge, MA, USA.

PMID: 37609130
PMCID: PMC10441312
DOI: 10.1101/2023.08.06.552164

A genome-wide atlas of human cell morphology

Meraj Ramezani et al. bioRxiv. 2023.

[Preprint]. 2023 Aug 7:2023.08.06.552164.

doi: 10.1101/2023.08.06.552164.

Authors

Affiliations

¹ Broad Institute of MIT & Harvard, Cambridge, MA, USA.
² Type 2 Diabetes Systems Genomics Initiative of the Broad Institute of MIT and Harvard, Cambridge, MA, USA.
³ Current address: Stanford University, Stanford, CA, USA.
⁴ Current address: Genentech Department of Cellular and Tissue Genomics, South San Francisco, CA, USA.
⁵ Calico Life Sciences LLC, South San Francisco, CA, USA.
⁶ Current address: Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, Colorado, USA.
⁷ Current address: University of Pennsylvania, Philadelphia, PA, USA.
⁸ The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA, USA.
⁹ Diabetes Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
¹⁰ MIT Department of Biological Engineering, Cambridge, MA, USA.
¹¹ Koch Institute for Integrative Research at MIT, Cambridge, MA, USA.

PMID: 37609130
PMCID: PMC10441312
DOI: 10.1101/2023.08.06.552164

Update in

A genome-wide atlas of human cell morphology.
Ramezani M, Weisbart E, Bauman J, Singh A, Yong J, Lozada M, Way GP, Kavari SL, Diaz C, Leardini E, Jetley G, Pagnotta J, Haghighi M, Batista TM, Pérez-Schindler J, Claussnitzer M, Singh S, Cimini BA, Blainey PC, Carpenter AE, Jan CH, Neal JT. Ramezani M, et al. Nat Methods. 2025 Mar;22(3):621-633. doi: 10.1038/s41592-024-02537-7. Epub 2025 Jan 27. Nat Methods. 2025. PMID: 39870862 Free PMC article.

Abstract

A key challenge of the modern genomics era is developing data-driven representations of gene function. Here, we present the first unbiased morphology-based genome-wide perturbation atlas in human cells, containing three genome-scale genotype-phenotype maps comprising >20,000 single-gene CRISPR-Cas9-based knockout experiments in >30 million cells. Our optical pooled cell profiling approach (PERISCOPE) combines a de-stainable high-dimensional phenotyping panel (based on Cell Painting^1,2) with optical sequencing of molecular barcodes and a scalable open-source analysis pipeline to facilitate massively parallel screening of pooled perturbation libraries. This approach provides high-dimensional phenotypic profiles of individual cells, while simultaneously enabling interrogation of subcellular processes. Our atlas reconstructs known pathways and protein-protein interaction networks, identifies culture media-specific responses to gene knockout, and clusters thousands of human genes by phenotypic similarity. Using this atlas, we identify the poorly-characterized disease-associated transmembrane protein TMEM251/LYSET as a Golgi-resident protein essential for mannose-6-phosphate-dependent trafficking of lysosomal enzymes, showing the power of these representations. In sum, our atlas and screening technology represent a rich and accessible resource for connecting genes to cellular functions at scale.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest C.H.J. and J.Y. are employees of Calico Life Sciences LLC. S.S. and A.E.C. serve as scientific advisors for companies that use image-based profiling and Cell Painting (A.E.C: Recursion, SyzOnc, S.S.: Waypoint Bio, Dewpoint Therapeutics) and receive honoraria for occasional talks at pharmaceutical and biotechnology companies. P.C.B. is a consultant to or holds equity in 10X Genomics, General Automation Lab Technologies/Isolation Bio, Celsius Therapeutics, Next Gen Diagnostics, Cache DNA, Concerto Biosciences, Stately, Ramona Optics, Bifrost Biosystems, and Amber Bio. P.C.B.’s laboratory receives research funding from Merck and Genentech for work related to genetic screening. The Broad Institute and MIT may seek to commercialize aspects of this work, and related applications for intellectual property have been filed including WO2019222284A1 In situ cell screening methods and systems. All other authors declare no competing interests.

Figures

**Extended Data Figure 1.. Example barcode calling based on twelve in-situ cycles**
An example of a group of cells tracked over the twelve cycles of in-situ sequencing to call barcodes. Cells 1 and 2 highlight how the signal from fluorescent nucleotides are translated into a barcode read over twelve cycles.

**Extended Data Figure 2.. Technical summary of the A549 whole genome screen**
(a) The distribution for the number of cells per gene or per guide present in the A549 dataset. (b) Comparison of the relative abundance of barcodes as quantified by NGS or in situ sequencing (R² = 0.84). (c-e) Comparison of the relative abundance of barcodes as quantified by in situ sequencing among 3 different bioreplicates representing individual viral transductions (R₁₂² = 0.85, R₁₃² = 0.85, R₂₃² = 0.94). (f) The distribution of normalized mean mitochondria channel intensity per cell for guides targeting the TOMM20 gene and the non-targeting control guides.

**Extended Data Figure 3.. Hit genes can be called in multiple channel combinations.**
Genes called as hits in the HeLa DMEM (a-b), HeLa HPLM (c-d), and A549 (e-f) screens can be called as hits because of significant perturbation to their whole profile, any individual screen channel, or any combination thereof. Specific combinations without any hit genes are omitted from the bar plots (a,c,e) and whole profile hit information is omitted from the Venn diagrams (b,d,f) for clarity.

**Extended Data Figure 4.. Morphological signal score is not well correlated with gene dependency or baseline gene expression.**
Comparison of the distribution of morphological signal scores and gene dependency scores (DepMap gene effect for the A549 cells and genetic dependencies estimated using the DEMETER2 for HeLa cells, the dashed red line at −0.5 threshold to highlight likely essential genes) for the A549 (a), the HeLa DMEM (b) or HeLa HPLM dataset (c). Comparison of the distribution of morphological signal scores and gene expression TPM values (DepMap dataset, values are inferred from RNA-seq data using the RSEM tool) for the A549 (d), the HeLa DMEM (e) or HeLa HPLM dataset (f).

**Extended Data Figure 5.. Examples of single cell images with morphological phenotypes labeled.**
Single cell images of specific perturbations retrieved from the A549 dataset representing five separate compartments and the segmentation shown as the cell Mask. (a) Single cell expressing non-targeting control guide RNA. Panels (b) to (f) represent images of single cells carrying guide RNA targeting genes with significant signal in specific cell compartment highlighted by the red box (Genes were selected based on the number of significant features targeting the specified compartment from gene sets highlighted in the Figure 2c).

**Extended Data Figure 6.. Technical summary of the DMEM HeLa whole genome screen**
(a) The distribution for the number of cells per gene or per guide present in the DMEM HeLa dataset. (b) Comparison of the relative abundance of barcodes as quantified by NGS or in situ sequencing (R² = 0.89). (c-e) Comparison of the relative abundance of barcodes as quantified by in situ sequencing among 3 different bioreplicates representing individual viral transductions (R₁₂² = 0.97, R₁₃² = 0.95, R₂₃² = 0.96).

**Extended Data Figure 7.. Technical summary of the HPLM HeLa whole genome screen**
(a) The distribution for the number of cells per gene or per guide present in the HPLM HeLa dataset. (b) Comparison of the relative abundance of barcodes as quantified by NGS or in situ sequencing (R² = 0.92). (c-e) Comparison of the relative abundance of barcodes as quantified by in situ sequencing among 3 different bioreplicates representing individual viral transductions (R₁₂² = 0.96, R₁₃² = 0.95, R₂₃² = 0.96).

**Extended Data Figure 8.. PERISCOPE identifies media-specific perturbation signatures**
(a) The enrichment map for biological processes based on the GSEA analysis of the profile signal strength from the DMEM and HPLM HeLa screens. The preranked GSEA analysis was performed using a list of all genes ordered based on the calculated signal strength as described in methods. The Gene Ontology Biological Processes (GOBP) gene set was employed for the enrichment analysis. Some of the labels and single/double nodes are not shown here for clarity (Full map available in the supplemental figures). (b, c) Heatmaps representing Pearson’s correlation between gene profiles after hierarchical clustering using Ward’s method. Gene complexes/processes were enriched in both HeLa DMEM and HPLM datasets (b) or one dataset (c, the enriched screen is identified by Clustered)based on the preranked GSEA analysis. The heatmaps are a combination of Pearson’s correlation from both screens and clustered based on the data from a single screen as described in the cartoon in Fig 3 (g).

**Extended Data Figure 9.. Identifying biological pathways using individual subcellular image features in A549 dataset**
The A549 dataset shows minimal GO enrichment in individual features making distribution of enrichment across channels (a) and feature categories (b) difficult to interpret. Outer ring is the total number of features in our feature-selected dataset and inner ring is the number of features that showed GO enrichment for a and b. (c) Vacuolar ATPase protein products are expected to function specifically in the WGA channel and vATPase genes are specifically enriched in hit lists for features in those compartments. N.S. indicates no enrichment in that gene list. Outer ring indicates the channel in which enrichment is expected. Inner ring is the breakdown of actual channels that show enrichment for the gene group. (d) Disruption of the Vacuolar ATPase (either V₀ or V₁ subunit) but not genes involved in its assembly causes a decrease in Golgi/Membrane signal in small structures as seen specifically with screen feature WGA_Granularity_1 but not larger granularities. Each trace is a single gene; those genes that are not hits in the screen are dashed. Bold lines are the mean of all genes in the group. (e) Specific signal in granularity features is not observed for a loss of function in genes involved in N-Glycan synthesis in the Endoplasmic Reticulum. Visualization of the signal measured at each granularity is shown for Golgi/Membrane (f) and Endoplasmic Reticulum (g).

**Extended Data Figure 10.. Phenotypic consequences of lysosomal trafficking perturbations (sample images contributing to Figure 6)**
(a) Confocal images of cells co-stained with WGA and LAMP1 antibody, as in Figure 6d, with KD of the remaining genes highlighted in Figure 6b as indicated. Quantified data were shown in Figure 6e. (b) Confocal images of cells co-stained with WGA and LAMP1 antibody, with single or dual gene KD as indicated. Quantified data were shown in Figure 6f. (c) Color overlays of mScarlet-Lamp1 cells with KD of genes indicated. Image intensity represents photon count per pixel, whereas hue encodes median lifetime per pixel. Quantified data were shown in Figure 6g. (d-e) Confocal images of live cells stained with KD of genes indicated and incubated with fluorogenic substrates of glucosylceramidase (d) and beta galactosidase (e), respectively. Quantified data were shown in Figure 6h–i.

**Figure 1.. Pooled optical screens with PERISCOPE**
(a) Experimental workflow for PERISCOPE screens. (b) Example images of five phenotypic stains and fluorescent in situ sequencing. (c) Schematic of destaining strategy to enable in situ sequencing after fluorescence imaging of phenotypic stains.(d) Overview of the PERISCOPE analysis pipeline including extraction of phenotypic features, deconvolution of barcodes and genotype-phenotype correlation. Schematics created with Biorender.

**Figure 2.. A genome-wide perturbation map in A549 cells**
Summary of whole genome PERISCOPE screen performed in A549 cells. (a) Hit genes identified in the screen include some single-compartment and some impacting multiple compartments and features across the cell. Green represents hit genes called based on a subset of cell compartments (ER, mitochondria, actin, DNA and Golgi/membrane) and blue represents hit genes called based on overall gene profile. Detailed description in the methods section. (b) Hit genes called based on a single compartment are distributed across all five measured compartments. It is possible for a gene to be hit in multiple compartments without being a whole cell hit, see Extended Data Fig. 3a–b for more details. (c) Pie charts showing the average normalized fraction of number of features significantly different from the control categorized based on target compartments for genes in the indicated set.(d) Distributions of optical profile correlations among all possible gene pairs versus correlations among gene pairs representing CORUM4.0 protein complexes that have at least two thirds of complex subunits within hit genes. (e) Boxen (Letter-value) plot representing STRING scores divided into bins based on PERISCOPE profile correlation between gene pairs. (f) UMAP embedding of the hit gene profiles from the A549 dataset. Each dot represents a genetic perturbation and distance implies the correlation of profile in a two dimensional embedding. Manual annotation of cluster functions are presented for highlighted clusters based on gene ontology (GO) data sets. Example insets show coherent clustering of related genes. (g) The distribution of morphological signal scores for essential and nonessential genes (DepMap gene effect at −0.5 threshold) for all perturbations in the A549 dataset. (h, i) Heatmaps representing Pearson correlation between gene profiles after hierarchical clustering using Ward’s method. Gene complexes/processes were enriched in the A549 dataset based on the preranked GSEA analysis. (h) displays hit genes belonging to the GO:CC chaperone complex genes set (GO:0101031). (i) displays hit genes belonging to the GO:CC proteasome complex (GO:0000502).

**Figure 3.. Genome-wide gene-by-environment maps**
Summary of the results from two whole genome PERISCOPE screens performed in HeLa cells (DMEM and HPLM). (a) Bar graph representing the number of hit genes identified in the screen. Green represents hit genes called based on single compartments (ER, Mitochondria, Actin, DNA and Golgi/Membrane) and blue represents hit genes called based on overall gene profile. Detailed description in the methods section. (b) The distribution of hit genes called based on a single compartment (See a). It is possible for a gene to be hit in multiple compartments without being a whole cell hit, see Extended Data Fig. 3c–f for more details. (c) Distributions of optical profile correlations between random hit gene pairs versus correlations between gene pairs representing CORUM4.0 protein complexes. For the DMEM and HPLM HeLa screens, respectively, we identified 1,663 and 1,604 unique genes forming 871 and 799 clusters with at least 2/3 of components present on the profile hit list. The correlations between all possible gene pairs within a cluster (median r_DMEM = 0.39, r_HPLM = 0.27) were stronger in comparison to the background correlations between random gene pairs (median r = 0.0) (d) Boxen plot (Letter-value plot) representing STRING scores divided into bins based on PERISCOPE profile correlation between gene pairs. (e) UMAP embedding of the hit gene profiles from the DMEM HeLa dataset. Each dot represents a genetic perturbation and distance implies the correlation of profile in a two dimensional embedding. Manual annotation of cluster functions are presented for highlighted clusters based on gene ontology (GO) data sets. Example insets show coherent clustering of related genes. (f) UMAP embedding of the hit gene profiles from the DMEM HeLa dataset. Each dot represents a genetic perturbation and distance implies the correlation of profile in a two dimensional embedding. Manual annotation of cluster functions are presented for highlighted clusters based on gene ontology (GO) data sets. (g) Schematic for generation of comparative diagonally-merged heatmaps. Heatmaps display Pearson’s correlation between gene profiles from both HeLa screens and are clustered based on the data from a single screens as described in the schematic.(h, i) Heatmaps representing Pearson’s correlation between gene profiles after hierarchical clustering using Ward’s method. Gene complexes/processes were commonly enriched in both screens (h) or selectively enriched in one screen (i, the enriched screen is labeled **Clustered**) of HeLa DMEM/HPLM datasets based on the preranked GSEA analysis.

**Figure 4.. Clustering by optical profiles captures physical interactions and signaling pathway relationships**
(a) Heatmaps representing Pearson’s correlation between gene profiles after hierarchical clustering using Ward’s method. The ribosomal gene complex was enriched in the DMEM HeLa dataset based on the preranked GSEA analysis. Different subset of ribosomal genes enriched in certain clusters are highlighted in the heatmap. Ribosome image created with Biorender. (b) Heatmaps representing Pearson’s correlation between gene profiles after hierarchical clustering using Ward’s method. The genes associated with the PI3K/AKT signaling pathway were enriched in the DMEM HeLa dataset based on the preranked GSEA analysis. The table on the right highlights the genes with activatory/inhibitory effects in accordance with the correlation/anti-correlation between profiles from the heatmap.

**Figure 5.. Identifying biological pathways using individual subcellular image features in HeLa datasets**
(a) GO enrichment is found in many individual features in a manner that is fairly evenly distributed across the cellular structures (i.e. channels) imaged in PERISCOPE. Outer ring is the total number of features in our feature-selected dataset. Inner ring is the number of features that show GO enrichment. (b) GO enrichment in individual features is not distributed evenly across classes of features. Outer ring is the total number of features in our feature-selected dataset. Inner ring is the number of features that show GO enrichment. (c) Given gene groups whose protein products are expected to function specifically in a cellular structure imaged in PERISCOPE, are specifically enriched in hit lists for features in those compartments. Outer ring indicates the channel in which enrichment is expected. Inner ring is the breakdown of actual channels that show enrichment for the gene group. (d) Disruption of the Vacuolar ATPase (either V₀ or V₁ subunit) but not genes involved in its assembly causes a decrease in WGA signal in small structures as seen specifically with screen feature WGA_Granularity_1 but not larger granularities. Each trace is a single gene; those genes that are not hits in the screen are dashed. Bold lines are the mean of all genes in the group. (e) Loss of function in genes involved in N-Glycan synthesis in the Endoplasmic Reticulum but not in other organelles nor the GPI synthesis pathway causes an increase of ConA signal in small structures as seen specifically with screen feature ConA_Granularity_1 but not larger granularities.

**Figure 6.. TMEM251 is essential for M6P-dependent trafficking of lysosomal enzymes**
(a) GSEA of genes preranked by cosine similarity to TMEM251 KO morphology. (b) Waterfall plot of the distribution of cosine similarities to TMEM251 morphology. Representative genes involved in glycosylation, trafficking and lysosomal acidification are highlighted. (c) TMEM251 localization was examined by confocal imaging of cells expressing fluorescent reporter of either GALNT2 (Golgi) or TMEM192 (lysosome) and stained for TMEM251. (d) Confocal images of cells with knockdown of genes indicated, co-stained with WGA and LAMP1 antibody. See supplementary figure for other perturbations. (e-f) Quantification of lysosomal WGA staining after CRISPRi knockdown of the indicated genes. Plotted are the upper quartiles of median per-cell lysosomal WGA intensity in biological replicates. (g) Boxplot of LAMP1-mScarlet fluorescence lifetimes, which correlates with lysosomal pH, for the indicated perturbations. Each point represents the median lifetime of lysosomal fluorescence in an image (n≥15 per condition). (h-i) log10 fold-changes of glucosylceramidase and beta galactosidase activity relative to non-targeting controls for the indicated CRISPRi knockdowns. Each point represents the median of per-cell total fluorescence intensity (MFI) in biological replicates, relative to non-targeting controls. (Statistical analysis: 2-tailed t-test vs Non-targeting for e to i)

See this image and copyright information in PMC

References

1. Rohban M. H. et al. Systematic morphological profiling of human gene and allele function via Cell Painting. Elife 6, (2017). - PMC - PubMed
1. Bray M.-A. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat. Protoc. 11, 1757–1774 (2016). - PMC - PubMed
1. Doench J. G. Am I ready for CRISPR? A user’s guide to genetic screens. Nat. Rev. Genet. 19, 67–80 (2018). - PubMed
1. Bock C. et al. High-content CRISPR screening. Nature Reviews Methods Primers 2, 1–23 (2022). - PMC - PubMed
1. Adamson B. et al. A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response. Cell 167, 1867–1882.e21 (2016). - PMC - PubMed

Publication types

Actions

Grants and funding

DP2 GM146252/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Research Materials
- Addgene Non-profit plasmid repository

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

A genome-wide atlas of human cell morphology

Affiliations

A genome-wide atlas of human cell morphology

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials