Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 11;84(13):2553-2572.e19.
doi: 10.1016/j.molcel.2024.05.024. Epub 2024 Jun 24.

Genome-scale exon perturbation screens uncover exons critical for cell fitness

Affiliations

Genome-scale exon perturbation screens uncover exons critical for cell fitness

Mei-Sheng Xiao et al. Mol Cell. .

Abstract

CRISPR-Cas technology has transformed functional genomics, yet understanding of how individual exons differentially shape cellular phenotypes remains limited. Here, we optimized and conducted massively parallel exon deletion and splice-site mutation screens in human cell lines to identify exons that regulate cellular fitness. Fitness-promoting exons are prevalent in essential and highly expressed genes and commonly overlap with protein domains and interaction interfaces. Conversely, fitness-suppressing exons are enriched in nonessential genes, exhibiting lower inclusion levels, and overlap with intrinsically disordered regions and disease-associated mutations. In-depth mechanistic investigation of the screen-hit TAF5 alternative exon-8 revealed that its inclusion is required for assembly of the TFIID general transcription initiation complex, thereby regulating global gene expression output. Collectively, our orthogonal exon perturbation screens established a comprehensive repository of phenotypically important exons and uncovered regulatory mechanisms governing cellular fitness and gene expression.

Keywords: CRISPR screen; Cas12a; TAF5; TFIID; alternative splicing; base editor; cell fitness exons; exon deletion; exon perturbation; functional genomics.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Generation of an enhanced exon deletion screening platform
(A) Schematic of the pooled cell fitness genetic screens. The DNA oligo library was cloned into pLCHKOv3, a modified version of the lentiviral pLCHKO vector, to generate exon-deletion screening libraries containing either the direct repeat (DR) sequence compatible for Lb or AsCas12a nucleases from a single oligo library pool. High-titer lentiviral stocks were transduced into Cas9/Cas12a expressing HAP1 and RPE1 cells at a low multiplicity of infection (MOI). Uninfected cells were removed through puromycin selection, and the remaining population was cultured for ~ 20 cell doublings, after which genomic DNA was extracted and the PCR-retrieved hgRNA cassette abundance was quantitated with Illumina paired-end sequencing. (B) Schematic of Cas9 and Cas12a lentiviral constructs engineered in this study. Specific point mutations are indicated. NLS: Nuclear Localization Signal; eIFα: human elongation factor-1 alpha promoter; NP: Nucleoplasmin NLS; SV40: SV40 NLS; T2A: 2A self-cleaving peptide; NeoR: neomycin/geneticin resistance gene; BlastR: blasticidin resistance gene. (C) Western blot analysis of Cas12a (using anti-Myc tag antibody) and Cas9 expression in total cell extracts from stably transduced HAP1 and RPE1 cells with GAPDH used as a loading control. (D) Schematic of the optimization hgRNA library design. hgRNA constructs were designed to delete or mutate frame-disruptive exons in core-essential and non-essential genes by targeting flanking intronic (top) or exonic sequences (bottom). Cas9 and Cas12a spacers are displayed as blue and orange triangles and total numbers of hgRNA pairs in different editing categories are indicated. (E-F) Receiver operating characteristic (ROC) curves of CHyMErA cell fitness screens using different Cas12a variants as described in (A). (E) Cas12a gene knockout (KO) paired with Cas9 intergenic hgRNAs single-targeting core-essential (true positive rates) and non-essential (false positive rates) genes are depicted. (F) Exon deletion hgRNAs targeting frame-disruptive exons in core-essential (true positive rates) and non-essential (false positive rates) genes are displayed. Dashed lines indicate random classifier. Area under the curve (AUC) values are listed for CHyMErA variants screened in HAP1 and RPE1 cells. (G) Flow cytometry analysis of CD46 protein expression in HAP1 (top) and RPE1 (bottom) CHyMErA variant cell lines following transduction with three independent hgRNA pairs targeting the frame-disruptive CD46 exon-3 for deletion. Values display cell percentage with undetectable CD46 expression. (H) Representative PCR assay monitoring CD46 exon-3 deletion efficiency from genomic DNA using different CHyMErA variants in HAP1 (top) and RPE1 (bottom) cells (see also Figure S2G). Bar plots indicate percentage of CD46 exon-3 deletion across three independent hgRNAs. (G-H) All data are represented as mean ± standard deviation. * p < 0.05, ** p < 0.01; two-tailed paired t test.
Figure 2:
Figure 2:. Large-scale exon deletion screening in human cells
(A) Characteristics of the targeted exons and cognate genes in the 300,000 hgRNA exon deletion library. (B) Visualization of exons with a fitness phenotype. All targeted frame-preserving exons were ranked by mean log2-fold change (LFC) of exon deletion hgRNAs in HAP1 and RPE1 cells. Hit exons are indicated in red (fitness-promoting) and blue (fitness-suppressing) while non-hits are shown in gray. (C) Volcano plot of HAP1 (top panel) and RPE1 (bottom panel) exon deletion screening data analyzed by MAGeCK using either the “intronic-intronic” exon deletion guides (left panel) or the “intronic-intergenic” single-intronic targeting control hgRNAs (right panel). Significant hits (FDR < 5%) are highlighted in red (fitness-promoting) or blue (fitness-suppressing). (D) Bar plot showing the fraction of off-target integration sites for Cas9 and Cas12a guides directed at intergenic (control) or intronic (exon-deletion) regions, determined by GUIDE-seq experiments using three independent hgRNAs for both intergenic controls and exon-deletion. All data are represented as mean ± standard deviation. Two-way paired t-test applied. (E) Pie chart indicating the integration frequencies at on- and off-target genomic regions for intergenic (control) or intronic (exon-deletion) hgRNAs, determined by GUIDE-seq. (F) Schematic overview of co-culture validation experiments. (G) mClover3 (Green) to mCherry (Red) ratios 4 days after co-culture set-up (T4) normalized to ratios quantified 24 hours after plating (T1). Exons identified as hits or non-hits in HAP1 and RPE1 exon deletion screens are indicated. In turquoise are hgRNAs that resulted in significantly skewed Green/Red ratios (p < 0.05; two-way ANOVA) in hit exons and hgRNAs that did not result in significant skewed Green/Red ratios in non-hit exons. hgRNAs that induced significant growth changes but are targeting non-hits are labeled in blue (false negative). Non-validated hgRNAs targeting exon screen hits are labeled in gray (false positive). Each exon was validated with 2 or 3 independent hgRNAs as indicated by individual bars. All data are represented as mean ± standard deviation from two to four replicates.
Figure 3:
Figure 3:. Applying orthogonal base editor screens to induce exon skipping through splice site mutations
(A) Schematic depiction of the base editors cloned in this study. BE: base editor; ABE: adenine base editor; CBE: cytosine base editor; TRE: tetracycline response element; NLS: nuclear localization signal; ssDBD: non-sequence specific single-stranded DNA binding domain from RAD51; UGI: uracil DNA glycosylase inhibitor; eIFα: human elongation factor-1 alpha promoter; T2A: 2A self-cleaving peptide; BlastR: blasticidin resistance gene; rtTA: reverse tetracycline-controlled transactivator. (B) Western blot analysis of inducible Cas9 base editors (using Cas9 antibody) in stably transduced HAP1 cell lines. Cells were treated with 2 μg/mL doxycycline for 48 hours. GAPDH is used as a loading control. (C-E) Base editors are programmed to induce 5’ splice site mutations and exon skipping of TAF5 exon-8 and SNAPC5 exon-2. (C) Schematic representation of base editor recruitment to 5’ (GT) and 3’ (AG) splice sites with targetable A & C bases indicated in red. (D-E) HAP1 cells expressing adenine or cytosine Cas9 base editors (see A) were transduced with three independent sgRNAs targeting the 5’ splice site of either TAF5 or SNAPC5 exons or an intergenic control sgRNA. Twenty-four hours after transduction cells were treated for 72 hours with 2 μg/mL puromycin and doxycycline to select successfully transduced cells and induce base editor expression. (D) Percentage of splice site mutation rates as determined by high-throughput sequencing of TAF5 and SNAPC5 amplicons. Error bars indicate standard deviation. (E) RT-PCR assays monitoring endogenous splicing of TAF5 exon-8 and SNAPC5 exon-2 (top) using gRNAs targeting their splice sites. Percent Spliced In (PSI) values are indicated. The bar plots (bottom) summarize the ΔPSI values of the three independent tested sgRNAs. All data are represented as mean ± standard deviation. * p < 0.05, ** p < 0.01; two-way ANOVA. (F) Characteristics of the guides, exons, and cognate genes targeted by the 27,871 sgRNA base editor library for the large-scale mutation of splice sites. The optimization (top) and exon-deletion validation (bottom) sections of the library are analyzed separately. NT: non-targeting guides. (G) ROC curve of cell fitness screens using different base editors as described in (A). sgRNAs directing the base editors to mutate splice sites of frame-disruptive exons in core-essential (true positive rates) and non-essential (false positive rates) genes are displayed. Dotted lines indicate random classifier. AUC values are listed for each base editor. (H) Spearman’s correlation coefficient of log2-fold change (LFC) exon drop-out between the CHyMErA and base editor screening in HAP1 cells. Only shared exons among CHyMErA, ABE8e, and evoCDA1 screens are displayed (n = 3,786). Shared hits from either ABE8e or evoCDA1 are labeled in red and LFC values of ABE8e are used for the scatter plot. Spearman’s correlation index (r) and p-value are indicated. (I) Overlap of the cell fitness affecting exons as determined by exon deletion (CHyMErA) or splice site mutation (base editors) screening in HAP1 cells. Only frame-preserving exons targeted by both CHyMErA and base editors (either ABE8e or evoCDA1) screens are compared. p-value = 2.05e−28; odds ratio = 3.02; Fisher’s exact test.
Figure 4:
Figure 4:. Fitness-promoting and -suppressing frame-preserving exons exhibit different characteristics
(A-B) Cumulative distribution function plots of fitness-promoting (red), fitness-suppressing (blue), or neutral (gray) exons in relation to (A) essentiality of corresponding genes as determined by the log2-fold change of KO hgRNAs and (B) expression of corresponding genes. p-values are indicated in the figure; Wilcoxon rank sum tests. (C) Bar plot displaying the fraction of fitness-promoting (red) and -suppressing (blue) exons among investigated alternative and constitutive exons. p-value = 1.03e−31 (fitness-promoting) and p-value = 0.03 (fitness-suppressing); Fisher’s exact test. (D) Cumulative distribution function plot of fitness-promoting (red), fitness-suppressing (blue) or neutral (gray) exons in relation to exon inclusion levels. p-values are indicated in the figure; Wilcoxon rank sum tests. (E) ROC curve of random forest model prediction for classifying fitness-promoting, frame-preserving exons. Dashed lines indicate random classifier. (F) Feature contribution to random forest model. (G) Bar plot displaying the density (number of events normalized to total exons length) of ClinVar mutations in fitness-promoting, fitness-suppressing, or neutral (non-hits) exons. **** p-value < 0.0001; Fisher’s exact test. (H) Box plot displaying the disordered prediction scores (IUPred) in fitness-promoting, fitness-suppressing, and neutral exons (non-hits). Boxes show interquartile range (IQR), 25th to 75th percentile, with the median indicated by a horizontal line. Whiskers extend to the quartile ± 1.5 × IQR. * p-value < 0.05, **** p-value < 0.0001; Wilcoxon rank sum test applied. (I-K) Bar plots displaying the density (number of events normalized to total exons length) of low complexity regions (I), pfam protein domains (J), or reported protein interaction interfaces (3did; K) in fitness-promoting, fitness-suppressing, and neutral (non-hits) exons. ** p-value < 0.01, **** p-value < 0.0001; Proportion (I, J) and Fisher’s exact (K) tests are applied. (A-K) All plots display combined data of frame-preserving exons from HAP1 and RPE1 cell lines.
Figure 5:
Figure 5:. TAF5 exon-8 inclusion is critical for TFIID complex assembly
(A) Schematic representation of the two TAF5 transcript isoforms generated by exon-8 inclusion or skipping and corresponding Ribo-seq reads overlapping splice site junctions. TAF5 domain structure at the top left with AlphaFold predictions of full length (FL) and ΔE8 TAF5 isoforms are depicted on the right. The protein region encoded by exon-8 is highlighted in red in the FL isoform. (B) Heatmap indicating the number of peptides corresponding to core TFIID subunits as detected through affinity-purification mass-spectrometry (AP-MS) analysis in HEK293 Flp-In cells expressing the indicated constructs as shown in Figure S7E. (C) Protein-protein interaction network involving TAF5 (gray hexagon) splice isoforms detected by AP-MS. Node color indicates differential association of protein interactions between the two isoforms. Protein-protein interactions extracted by the STRING database (STAR Methods) were used to generate edges between the preys identified by our AP-MS experiment. (D) Western blot analysis of total cell lysates (input) and FLAG immunoprecipitates (IP: FLAG-M2) from HEK293 Flp-In cells expressing 3xFLAG-tagged TAF5 cDNAs with exon-8 included (FL) or excluded (ΔE8) isoforms using anti-FLAG antibodies and antibodies specific for TAF5, TBP, TAF1, TAF6, TAF12, CCT2, and β-Tubulin as a loading control. (E-F) Western blot analysis of total cell lysates (input) and TBP immunoprecipitates (IP: TBP) from HEK293 Flp-In cell lines stably expressing doxycycline-inducible siRNA-resistant 3xFLAG-tagged TAF5 cDNAs with exon-8 included (FL) or excluded (ΔE8), and treated with control siRNAs (siNT) or an siRNA that depletes endogenous TAF5 (siTAF5). IgG immunoprecipitation was performed as control. (E) Blots were probed with antibodies specific for TBP, FLAG, TAF5, TAF1, TAF6, TAF10, TAF12, and GAPDH as a loading control. (F) Quantifications of three independent TBP immunoprecipitation experiments. Data are represented as mean ± standard deviation. *** p-value < 0.001; two-way ANOVA.
Figure 6:
Figure 6:. TAF5 exon-8 inclusion is required for TAF5-dependent gene expression
(A-B) Western blot analysis (A) of TAF5 isoform expression in HEK293 Flp-In cell lines stably expressing doxycycline-inducible siRNA-resistant 3×FLAG-tagged TAF5 cDNAs with exon-8 included (FL) or excluded (ΔE8), and treated with control siRNAs (siNT) or an siRNA that depletes endogenous TAF5 (siTAF5). Blots were probed with antibodies specific for TBP, FLAG, TAF5, and GAPDH as a loading control. (B) RNA-seq profiled gene expression changes (Z-score normalized) upon the same treatments. Only genes with significant expression changes upon siTAF5 that are rescued by TAF5-FL are displayed. (C-D) Metagene analysis of TBP (C) and RNA polymerase II (D) occupancy normalized to input around transcription start sites (TSS) of all expressed, TAF5-suppressed (upregulated upon siTAF5/TAF5-ΔE8 rescue) and TAF5-promoted (downregulated) genes as determined by ChIP-sequencing of HEK293 cells treated as above. (E) siRNA screen monitoring the impact of 60 splicing regulators on TAF5 exon-8 alternative splicing. All data are represented as mean ± standard deviation from four biological replicates. **** p < 0.0001; two-way ANOVA. (F) RT-PCR assays monitoring splicing of endogenous TAF5 exon-8 in HAP1 (top) and HEK293 (bottom) cells transfected with three independent siRNAs and an siRNA pool against SRSF1. PSI values are indicated. (G) RT-PCR assays monitoring splicing of wild-type (WT) and SRSF1-motif mutant minigene reporters of TAF5 exon-8. HEK293T cells treated with three independent siRNAs and an siRNA pool against SRSF1 or non-targeting (NT) siRNA control were transfected with reporters 24 hrs prior to harvesting RNA.
Figure 7:
Figure 7:. Summary of characteristics of fitness-promoting and -suppressing exons
(A) Graphical summary of TAF5 exon-8 dependent TFIID assembly and gene expression regulation. (B) Table of distinct features of fitness-promoting and -suppressing exons.

References

    1. Pan Q, Shai O, Lee LJ, Frey BJ, and Blencowe BJ (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature genetics 40, 1413–1415. 10.1038/ng.259. - DOI - PubMed
    1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, and Burge CB (2008). Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476. 10.1038/nature07509. - DOI - PMC - PubMed
    1. Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R, et al. (2012). The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593. 10.1126/science.1230612. - DOI - PubMed
    1. Merkin J, Russell C, Chen P, and Burge CB (2012). Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science 338, 1593–1599. 10.1126/science.1228186. - DOI - PMC - PubMed
    1. Wright CJ, Smith CWJ, and Jiggins CD (2022). Alternative splicing as a source of phenotypic diversity. Nat Rev Genet 23, 697–710. 10.1038/s41576-022-00514-4. - DOI - PubMed

Substances

LinkOut - more resources