. 2021 Mar 4;81(5):998-1012.e7.

doi: 10.1016/j.molcel.2020.12.018. Epub 2021 Jan 12.

Co-transcriptional splicing regulates 3' end cleavage during mammalian erythropoiesis

Kirsten A Reimer¹, Claudia A Mimoso², Karen Adelman², Karla M Neugebauer³

Affiliations

¹ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.
² Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA.
³ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA. Electronic address: karla.neugebauer@yale.edu.

PMID: 33440169
PMCID: PMC8038867
DOI: 10.1016/j.molcel.2020.12.018

Co-transcriptional splicing regulates 3' end cleavage during mammalian erythropoiesis

Kirsten A Reimer et al. Mol Cell. 2021.

. 2021 Mar 4;81(5):998-1012.e7.

doi: 10.1016/j.molcel.2020.12.018. Epub 2021 Jan 12.

Authors

Kirsten A Reimer¹, Claudia A Mimoso², Karen Adelman², Karla M Neugebauer³

Affiliations

¹ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.
² Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA.
³ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA. Electronic address: karla.neugebauer@yale.edu.

PMID: 33440169
PMCID: PMC8038867
DOI: 10.1016/j.molcel.2020.12.018

Abstract

Pre-mRNA processing steps are tightly coordinated with transcription in many organisms. To determine how co-transcriptional splicing is integrated with transcription elongation and 3' end formation in mammalian cells, we performed long-read sequencing of individual nascent RNAs and precision run-on sequencing (PRO-seq) during mouse erythropoiesis. Splicing was not accompanied by transcriptional pausing and was detected when RNA polymerase II (Pol II) was within 75-300 nucleotides of 3' splice sites (3'SSs), often during transcription of the downstream exon. Interestingly, several hundred introns displayed abundant splicing intermediates, suggesting that splicing delays can take place between the two catalytic steps. Overall, splicing efficiencies were correlated among introns within the same transcript, and intron retention was associated with inefficient 3' end cleavage. Remarkably, a thalassemia patient-derived mutation introducing a cryptic 3'SS improved both splicing and 3' end cleavage of individual β-globin transcripts, demonstrating functional coupling between the two co-transcriptional processes as a determinant of productive gene output.

Keywords: PacBio; co-transcriptional splicing; erythropoiesis; globin; long-read sequencing; nascent RNA.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1.. Long-read sequencing of nascent RNA from differentiating mouse erythroblasts**
**(A)** Schematic of nascent RNA isolation and sequencing library generation. MEL cells are treated with 2% DMSO to induce erythroid differentiation, cells are fractionated to purify chromatin, and chromatin-associated nascent RNA is depleted of polyadenylated and ribosomal RNAs. An adapter is ligated to the 3′ ends of remaining RNAs, then a strand-switching reverse transcriptase is used to create double-stranded cDNA that is the input for PacBio library preparation. **(B)** Read length and **(C)** read depth distribution of PacBio long-reads. See also Figures S1 and S2, and Table S1.

**Figure 2.. Individual mammalian nascent RNA sequences reveal coordination of co-transcriptional splicing.**
**(A)** LRS data visualization for analysis of co-transcriptional splicing. Gene diagram is shown at the top, with the black arrow indicating the TSS. Reads are aligned to the genome and ordered by 3′ end position. Color code indicates the splicing status of each transcript. Each horizontal row represents one read. Panels at far right and below: regions of missing sequence (e.g. spliced introns) are transparent. Light gray shading indicates regions of exons, and dark gray shading indicates the region downstream of the annotated PAS (dotted red line). The number of individual long-reads aligned to each gene (n) is indicated. Bar graph at the far right of each plot indicates the fraction of reads that are all spliced (dark purple), partially spliced (light purple), or all unspliced (yellow) for that gene. **(B)** LRS data are shown for uninduced (top) and induced (bottom) MEL cells for three representative genes: *Actb*, *Calr*, *Eif1*. **(C)** Fraction of long-reads that are all spliced, partially spliced, or all unspliced (n = 120,143 reads uninduced, n = 71,639 reads induced). **(D)**. For each intron that is covered by 10 or more reads, CoSE is defined as the number of reads that are spliced divided by the total number of reads that span the intron. **(E)** Variance in CoSE for transcripts that include 3 or more introns covered by at least 10 reads (n = 1,240 transcripts uninduced, n = 788 transcripts induced) compared to the variance in CoSE for a randomly selected group of introns. Significance tested by Mann Whitney U-test; *** represents p-value < 0.001. See also Figure S3.

**Figure 3.. Spliceosome-Pol II proximity is unchanged by differentiation.**
**(A)** Schematic definition of the distance from the 3′ end of a nascent RNA (nRNA) to the most 3′-proximal splice junction. 3′ end sequence reports the position of Pol II when nascent RNA was isolated. **(B)** Distance (nt) from the 3′-most splice junction to Pol II position is shown as a cumulative fraction for uninduced and induced cells (n = 101,911 observations uninduced, n = 66,656 induced). **(C)** CoSE in induced and uninduced conditions. Each point represents a single intron which is covered by at least 10 long-reads in both induced and uninduced conditions. Spearman’s rho = 0.56, n = 4,170 introns. See also Figure S4.

**Figure 4.. Pol II does not pause at 5′ or 3′ splice sites.**
**(A)** PRO-seq 3′ end coverage is shown aligned to active transcription start sites (TSS), 5′ splice sites (5′SS), and 3′ splice sites (3′SS). **(B)** Top: Schematic illustrating the use of color-coded intervals to quantify PRO-seq reads around each 5′SS and 3′SS to test for significance of pausing. Bottom: PRO-seq read density summed in each of the intervals indicated above around 5′SSs (left) and 3′SSs (right) from introns with at least 10 reads in uninduced conditions (n = 3,505). Significance tested by paired t-test; *** represents p-value < 0.001, ns represents p-value > 0.05. **(C)** Genome browser view showing spliced PRO-seq reads aligned to the *Apbb1* gene, where 3′ ends of reads represent the position of elongating Pol II. Only spliced reads, filtered from all reads, are shown. See also Figure S5.

**Figure 5.. Splicing intermediates are abundant at introns with weak 3′ splice sites**
**(A)** Schematic definition of first step splicing intermediates (dotted red oval), which have undergone the first step of splicing and have a free 3′-OH that can be ligated to the 3′ end DNA adapter. Splicing intermediate reads are characterized by a 3′ end at the last nucleotide of the upstream exon. **(B)** Coverage of long-read 3′ ends (top panels) and 5′ ends (bottom panels) aligned to 5′SSs (left) and 3′SSs (right) of introns. **(C)** Coverage of long-read 3′ ends across four example genes. Arrows indicate the positions where the most abundant splicing intermediates are observed. **(D)** Individual long-reads are shown for the gene *Alas2*. Diagram is similar to Figure 2, but individual reads are colored depending on whether they are splicing intermediates (purple) or not (gray). Data for uninduced and induced cells are shown combined. Potential recursive splicing site is indicated by an arrow and dotted line; recursively spliced reads are shown in detail in **(E)**. **(F)** MaxEnt splice site scores for 5′SS (left) and 3′SS (right) for introns with a coverage of at least 10 long-reads is shown categorized by the normalized intermediate count (NIC) at each intron. Introns with NIC = 0 (n = 3,890) are shown separately, and all other introns with NIC > 0 (n = 2,647) are separated in quartiles with NIC values shown. **(G)** Raw PRO-seq 3′ end coverage from uninduced cells aligned to 5′SSs, and 3′SSs for introns with NIC = 0 (n = 4,402), or NIC > 0 (n = 3,427). See also Figure S6.

**Figure 6.. Poor splicing efficiency is associated with inefficient 3′ end cleavage**
**(A)** Individual long-reads are shown for the major β-globin gene (*Hbb-b1*). Diagram is as described in Figure 2. **(B)** LRS coverage (orange) and PRO-seq 3′ end coverage (purple) in induced cells is shown at the *Hbb-b1* gene. Scale at the left indicates coverage in number of reads, and red dotted line indicates PAS. We note that the duplicated copies of β-globin in the genome (*Hbb-b1* and *Hbb-b2*) impedes unique mapping of short PRO-seq reads in the coding sequence, artificially reducing gene body reads. **(C)** Fraction of uncleaved long-reads (top) and all other long-reads (bottom) categorized by splicing status (as described in Figure 2). Uncleaved reads have a 5′ end within an actively transcribed gene region and a 3′ end greater than 50 nt downstream of the PAS (n = 5,694 uncleaved long-reads, and n = 172,612 other long-reads). **(D)** Long-read coverage in the region downstream of PASs is shown for long-reads separated by their splicing status. Coverage is normalized to the position 100 nt upstream of each PAS (n = 35,982 all unspliced reads, n = 24,102 partially spliced reads, and n = 134,581 all spliced reads). Red dotted line indicates PAS position. See also Figure S7.

**Figure 7.. Efficient splicing promotes 3′ end cleavage**
**(A)** Top: schematic describing two engineered MEL cell lines. MEL-*HBB* ^WT contains an integrated copy of a wild type human globin minigene. In MEL-*HBB* ^IVS-110(G>A), a single point mutation (red triangle) mimics a disease-causing thalassemia allele. Bottom: Sanger sequencing of the *HBB* minigene coding strand shows that a G>A mutation leads to a cryptic 3′SS at the AG dinucleotide 19 nt upstream of the canonical 3′SS. **(B)** Distribution of *HBB* long-reads in MEL-*HBB* ^WT cells (purple) and MEL-*HBB* ^IVS-110(G>A) cells (orange) separated by splicing status of intron 1 and intron 2 and measured as a fraction of total reads mapped to the *HBB* gene (n = 20,395 reads in MEL-*HBB* ^WT cells, and n = 26,244 reads in MEL-*HBB* ^IVS-110(G>A) cells). **(C)** Fraction of splicing intermediates at intron 1 and intron 2 in MEL-*HBB* ^WT cells (purple) and MEL-*HBB* ^IVS-110(G>A) cells (orange) measured as a fraction of total reads mapped to the *HBB* gene. For **(B-C)**, significance tested by Mann Whitney U-test; *** represents p-value < 0.001, bar height represents the mean of three biological replicates, and error bars represent standard error of the mean. **(D)** Read coverage in the region downstream of the *HBB* PAS is shown for long-reads separated by their splicing status from MEL-*HBB* ^WT cells (purple) and MEL-*HBB* ^IVS-110(G>A) cells (orange). Coverage is normalized to the position 100 nt upstream of the PAS. Solid line represents the mean coverage of three biological replicates, and shaded windows represent standard error of the mean. **(E)** Model describing the variety of co-transcriptional splicing efficiencies observed during mouse erythropoiesis.

See this image and copyright information in PMC

References

1. Alexander RD, Barrass JD, Dichtl B, Kos M, Obtulowicz T, Robert MC, Koper M, Karkusiewicz I, Mariconti L, Tollervey D, et al. (2010a). RiboSys, a high-resolution, quantitative approach to measure the in vivo kinetics of pre-mRNA splicing and 3’-end processing in Saccharomyces cerevisiae. RNA 16, 2570–2580. - PMC - PubMed
1. Alexander RD, Innocente SA, Barrass JD, and Beggs JD (2010b). Splicing-dependent RNA polymerase pausing in yeast. Mol Cell 40, 582–593. - PMC - PubMed
1. Alpert T, Straube K, Carrillo Oesterreich F, and Neugebauer KM (2020). Widespread Transcriptional Readthrough Caused by Nab2 Depletion Leads to Chimeric Transcripts with Retained Introns. Cell Reports 33, 108324. - PMC - PubMed
1. An X, Schulz VP, Li J, Wu K, Liu J, Xue F, Hu J, Mohandas N, and Gallagher PG (2014). Global transcriptome analyses of human and murine terminal erythroid differentiation. Blood 123, 3466–3477. - PMC - PubMed
1. Antoniou M (1991). Induction of Erythroid-Specific Expression in Murine Erythroleukemia (MEL) Cell Lines. Methods Mol Biol 7, 421–434. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Co-transcriptional splicing regulates 3' end cleavage during mammalian erythropoiesis

Affiliations

Co-transcriptional splicing regulates 3' end cleavage during mammalian erythropoiesis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases