Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Oct 28:2024.10.28.620517.
doi: 10.1101/2024.10.28.620517.

Combinatorial effector targeting (COMET) for transcriptional modulation and locus-specific biochemistry

Affiliations

Combinatorial effector targeting (COMET) for transcriptional modulation and locus-specific biochemistry

Caroline M Wilson et al. bioRxiv. .

Abstract

Understanding how human gene expression is coordinately regulated by functional units of proteins across the genome remains a major biological goal. Here, we present COMET, a high-throughput screening platform for combinatorial effector targeting for the identification of transcriptional modulators. We generate libraries of combinatorial dCas9-based fusion proteins, containing two to six effector domains, allowing us to systematically investigate more than 110,000 combinations of effector proteins at endogenous human loci for their influence on transcription. Importantly, we keep full proteins or domains intact, maintaining catalytic cores and surfaces for protein-protein interactions. We observe more than 5800 significant hits that modulate transcription, we demonstrate cell type specific transcriptional modulation, and we further investigate epistatic relationships between our effector combinations. We validate unexpected combinations as synergistic or buffering, emphasizing COMET as both a method for transcriptional effector discovery, and as a functional genomics tool for identifying novel domain interactions and directing locus-specific biochemistry.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS C.M.W., G.C.P., J.S.W., and L.A.G. have filed patent applications on the platform/materials presented in this report. J.S.W. serves as an advisor to and/or has equity in 5 AM Ventures, Amgen, Chroma Medicine, KSQ Therapeutics, Maze Therapeutics, Tenaya Therapeutics and Tessera Therapeutics. L.A.G. has filed patents on CRISPR tools and CRISPR functional genomics, is a co-founder of Chroma Medicine, and a consultant for Chroma Medicine. The other authors declare no competing interests.

Figures

Figure 1.
Figure 1.. COMET generates combinatorial dCas9-based fusion proteins for endogenous gene targeting.
(A) Pie chart showing library components as characterized by PANTHER GO-Slim gene ontology molecular signatures, showing categories containing ≥ 25 proteins, in addition to 81 effectors from the literature. Library elements were synthesized in an arrayed fashion, recoded for N- and C-termini of dCas9 and cloned in a pair-wise fashion. COMET screening schema showing pooled lentiviral library generation followed by infection into a gRNA-expressing cell line. Later in the screen, cells were harvested, stained for the target gene of interest, and binned by target gene expression for FACS. The top & bottom 25% of CD55 expressing cells were collected for library preparation and PacBio long read sequencing. (B) Structure of dCas9 in complex with a gRNA (PDB ID: 4ZT9) with appended XTEN16-2XNLS linker (light purple) on the N-terminus of dCas9 and an XTEN80 linker (dark purple) on the C-terminus of dCas9, highlighting the long linker lengths with respect to dCas9’s 3D structure. (C) Histogram showing fusion protein lengths (bp) in the L1 (blue) and L2 (magenta) libraries. The length of dCas9 and the flanking linkers is 4464bp. (D) L1 COMET screen in K562 cells replicate correlation for the CD55 low condition showing the log10(fraction) of reads per library combination [Pearson correlation coefficient (R) CD55low = 0.99 (p<2.2e-16)]. Points are colored by fusion protein length (bp). See also Supplementary Figure 1B. (E) DESeq of L1 screen data showing Log2Fold Change comparing the CD55high to CD55low conditions, with −log10(p-value) on the y-axis. Points are labeled with N- and C-domain names and colored by full length fusion protein length (bp). See COMET interactive volcano plots for further data visualization: https://comet-ivp.gilbertlab.arcinstitute.org/. (F) Heatmap of fusion protein combinations showing DESeq Log2FC values (activating in blue; repressive in red) with C-termini domains on the x-axis and N-termini domains on the y-axis.
Figure 2.
Figure 2.. Large-scale combinatorial transcriptional effector discovery enables robust gene modulation.
(A) L2 COMET screening platform replicates are highly correlated. Plots show CD55low (top) and CD55high (bottom) replicate correlations using normalized counts (fraction) for domain combinations. Pearson correlation coefficient (R): CD55high =0.89, p-value < 2.2e-16; CD55low = 0.92, p-value < 2.2e-16. Points are colored by fusion protein length (bp). See also Supplementary Figure 2A. (B) DESeq for L2 screen in K562 cells showing Log2FC (High:Low CD55 expression bins) on the x-axis and −log10(p-value) on the y-axis, with 5805 significant fusion protein combinations (p-value ≤ 0.05). See COMET interactive volcano plots at https://comet-ivp.gilbertlab.arcinstitute.org/. (C) Heatmap of Log2FC (High:Low CD55 expression bins) for individual N-C effector combinations. N domains are on the y axis and C domains are on the x axis, with positive log2FC in blue (activators) and negative log2FC values in red (repressors). See also Supplementary Figure 2B for enlarged format. (D) Histogram of 5085 significant L2 screen effector lengths (N-terminal + C-terminal, excluding constant dCas9 sequence of 4464bp), showing 5744 domain pairs longer than 480 bp total (pink dashed line). (E) Top 15 most frequently observed significant (p-value < 0.05) N or C domains for repressors (left, log2FC < −1) and activators (right, log2FC > 1) from the L2 K562 screen at CD55, showing enriched N effectors in orange and C effectors in pink. Frequency is shown on the x-axis. See also Supplementary Figure 3A. (F&G) Upset plots categorizing N and C domain function for the 100 strongest (F) repressors (log2FC < −1) and (G) activators (log2FC > 1) from the L2 K562 screen with an intersection size ≥5. See also Supplementary Figure 3B-C. (H) Screen phenotype (log2FC) values from DESeq for individual effector combinations (purple circle) compared to arrayed validation (blue square, 2 replicates) gene expression fold change values as measured by flow cytometry relative to a dCas9 protein with inactive N- and C-terminal domains (pCW31) on day 3 post-transduction. Screen and arrayed validation log2FC values are highly correlated [Pearson R squared: 0.9034, p-value (two-tailed) <0.001]. (I) Arrayed validation of COMET nominated constructs in dual-gRNA expressing K562 cells targeting the CD55 locus or CXCR4 locus as measured by antibody staining and flow cytometry 3 days post-transduction. Bar plot values represent the median log2Fold Change of target gene expression (CD55 or CXCR4) per effector combination normalized to the median target gene expression in cells transduced with negative control dCas9 protein (pCW31) for two replicates per transduction. Cells are gated on stable gRNA expression (BFPpos), followed by mNeonGreen fluorescence as a proxy for transduction efficiency/protein expression. (J) Fold change expression of IL-2 as measured by qPCR in K562 cells expressing a dual-gRNA targeting IL-2 and transduced with activator constructs. IL2 expression is normalized to guide-expressing parental K562 cells and GAPDH is the endogenous control. Mean CT values (run in triplicate) and standard deviations are used in the ΔΔCT calculations.
Figure 3.
Figure 3.. Combinatorial effector discovery in iPSCs.
(A) Replicate correlations for COMET L1 screen in KOLF2.1J iPSCs targeting the CD55 locus, showing log10(fraction) of reads per effector combination for CD55low and CD55high. Pearson correlation coefficient (R): CD551ow = 0.99 (p<2.2e-16), CD55high =0.99 (p<2.2e-16). (B) DESeq for L1 screen in iPSCs showing Log2FC (High:Low CD55 expressing bins) on the x-axis and −log10(p-value) on the y-axis, with 41 significant fusion protein combinations (p-value ≤ 0.05). A count threshold of 5 was used for all conditions. See COMET interactive volcano plots at https://comet-ivp.gilbertlab.arcinstitute.org/.
Figure 4.
Figure 4.. COMET screening enables discovery of domain epistasis.
(A) Schematic of changes in gene expression depending on N-, C-, or dual fusion activity, wherein activity greater than the combination of individual activities is synergistic. N-C fusions with activity less than the activity of an individual effector is buffering or antagonistic. (B) Schematic of residual analysis comparing the expected phenotype to the observed screen phenotype, where the residual is the difference between our observed and expected phenotypes. (C) Scatter plot showing epistasis modeling predictions for effector combination activity on the x-axis for Replicate 1 and DESeq2 Log2FC values for effector combinations from the COMET L2 screen in K562 cells on the y-axis. Colored points represent zones of residual values, i.e. the distance between the expected vs observed gene modulation activity at the CD55 locus. 29.23% of combinations have residuals between 0.3 and 0.6 or −0.3 and −0.6 (light blue); 11.50% of combinations have residuals between 0.6 and 1 or −0.6 and −1 (green); and 3.52% of combinations have residuals >1 or <= −1 (dark blue). 55.75% of combinations have minimal residuals (less than 0.3) (gray). Combinations nominated with residuals >1 (dark blue zone) were subsequently cloned as double and single-dCas9 effectors to evaluate synergy. (D & E) Arrayed validation of nominated effectors by flow cytometry, with median log2FC (normalized to an inactive dCas9 fusion protein) on the y-axis. Dual and single terminus fusion proteins were cloned and transduced into single CD55 gRNA-expressing K562 cells. (D) ASH1L_HUMAN (N172)-KAT6B_HUMAN (C106); ANM1-VP64; IN80E-p65-HSF1. (E) MBD2-PRDM2 (PRDM2_HUMAN_B, C131); MBD2-CDYL; and SIR7-HST2. Wilcoxon t-tests per replicate per comparison were performed in R (*** denotes p-value <2e-16).
Figure 5.
Figure 5.. MBD2-CDYL exhibit repressive synergy and ablation of H3K27ac at CD55 locus.
(A) Schematic of MBD2 and CDYL domain architecture. The C-terminal region of MBD2 (200aa, shaded yellow box) is included in the specific effector combination we evaluated with full length CDYL (598aa). (B) Overlay of H3K27ac peaks in IGV for all 8 samples (parental K562s, MBD2-single, single-CDYL, and MBD2-CDYL, with 2 replicates) at an 11kb region around the transcription start site (TSS) of CD55. (C) H3K27ac CUT&Tag data for parental K562 cells (black), or cells transduced with MBD2-single (pink), single-CDYL (light blue), and MBD2-CDYL (green) effectors across two technical replicates as visualized in IGV across a 31kb region surrounding the TSS. Position of the guide RNA is denoted in pink.

References

    1. Wang J.Y., and Doudna J.A. (2023). CRISPR technology: A decade of genome editing is only the beginning. Science 379, eadd8643. 10.1126/science.add8643. - DOI - PubMed
    1. Komor A.C., Kim Y.B., Packer M.S., Zuris J.A., and Liu D.R. (2016). Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424. 10.1038/nature17946. - DOI - PMC - PubMed
    1. Thakore P.I., D’Ippolito A.M., Song L., Safi A., Shivakumar N.K., Kabadi A.M., Reddy T.E., Crawford G.E., and Gersbach C.A. (2015). Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149. 10.1038/nmeth.3630. - DOI - PMC - PubMed
    1. Amabile A., Migliara A., Capasso P., Biffi M., Cittaro D., Naldini L., and Lombardo A. (2016). Inheritable Silencing of Endogenous Genes by Hit-and-Run Targeted Epigenetic Editing. Cell 167, 219–232.e14. 10.1016/j.cell.2016.09.006. - DOI - PMC - PubMed
    1. Nakamura M., Gao Y., Dominguez A.A., and Qi L.S. (2021). CRISPR technologies for precise epigenome editing. Nat. Cell Biol. 23, 11–22. 10.1038/s41556-020-00620-7. - DOI - PubMed

Publication types

LinkOut - more resources