This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Sep 19:2023.09.15.558015.

doi: 10.1101/2023.09.15.558015.

Single-cell nascent RNA sequencing using click-chemistry unveils coordinated transcription

Dig B Mahat¹, Nathaniel D Tippens¹, Jorge D Martin-Rufino², Sean K Waterton^{1

3}, Jiayu Fu^{1

4}, Sarah E Blatt^{1

5}, Phillip A Sharp^{1

6}

Affiliations

¹ Koch Institute for Integrative Cancer Research and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139.
² Broad Institute of MIT and Harvard, Cambridge, MA 02142.
³ Current address: Department of Biology, Stanford University, Stanford, CA 94305.
⁴ Current address: Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, IL 60208.
⁵ Current address: Exact Sciences Corporation, Madison, WI 53719.
⁶ Lead Contact.

PMID: 37745427
PMCID: PMC10516050
DOI: 10.1101/2023.09.15.558015

Single-cell nascent RNA sequencing using click-chemistry unveils coordinated transcription

Dig B Mahat et al. bioRxiv. 2023.

[Preprint]. 2023 Sep 19:2023.09.15.558015.

doi: 10.1101/2023.09.15.558015.

Authors

Dig B Mahat¹, Nathaniel D Tippens¹, Jorge D Martin-Rufino², Sean K Waterton^{1

3}, Jiayu Fu^{1

4}, Sarah E Blatt^{1

5}, Phillip A Sharp^{1

6}

Affiliations

¹ Koch Institute for Integrative Cancer Research and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139.
² Broad Institute of MIT and Harvard, Cambridge, MA 02142.
³ Current address: Department of Biology, Stanford University, Stanford, CA 94305.
⁴ Current address: Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, IL 60208.
⁵ Current address: Exact Sciences Corporation, Madison, WI 53719.
⁶ Lead Contact.

PMID: 37745427
PMCID: PMC10516050
DOI: 10.1101/2023.09.15.558015

Update in

Single-cell nascent RNA sequencing unveils coordinated global transcription.
Mahat DB, Tippens ND, Martin-Rufino JD, Waterton SK, Fu J, Blatt SE, Sharp PA. Mahat DB, et al. Nature. 2024 Jul;631(8019):216-223. doi: 10.1038/s41586-024-07517-7. Epub 2024 Jun 5. Nature. 2024. PMID: 38839954 Free PMC article.

Abstract

Transcription is the primary regulatory step in gene expression. Divergent transcription initiation from promoters and enhancers produces stable RNAs from genes and unstable RNAs from enhancers^1-5. Nascent RNA capture and sequencing assays simultaneously measure gene and enhancer activity in cell populations^6-9. However, fundamental questions in the temporal regulation of transcription and enhancer-gene synchrony remain unanswered primarily due to the absence of a single-cell perspective on active transcription. In this study, we present scGRO-seq - a novel single-cell nascent RNA sequencing assay using click-chemistry - and unveil the coordinated transcription throughout the genome. scGRO-seq demonstrates the episodic nature of transcription, and estimates burst size and frequency by directly quantifying transcribing RNA polymerases in individual cells. It reveals the co-transcription of functionally related genes and leverages the replication-dependent non-polyadenylated histone genes transcription to elucidate cell-cycle dynamics. The single-nucleotide spatial and temporal resolution of scGRO-seq identifies networks of enhancers and genes and indicates that the bursting of transcription at super-enhancers precedes the burst from associated genes. By imparting insights into the dynamic nature of transcription and the origin and propagation of transcription signals, scGRO-seq demonstrates its unique ability to investigate the mechanisms of transcription regulation and the role of enhancers in gene expression.

PubMed Disclaimer

Conflict of interest statement

Competing interests US patent number US-11519027-B2 on ‘SINGLE-CELL RNA SEQUENCING USING CLICK-CHEMISTRY” was granted on December 6, 2022, to the Massachusetts Institute of Technology, Cambridge, MA (US), on which P.A.S. and D.B.M. are inventors. The authors declare no other competing interests.

Figures

**Extended Data Fig. 1 |. click-chemistry mediated nascent RNA conjugation to single-stranded DNA and optimization of reverse transcription.**
a, Click-chemistry compatible nucleotides tested in AGTuC development. A few nucleotide triphosphates were custom synthesized or sourced with few properties in mind - smaller size, chain termination ability, and the possibility of incorporation by native RNA polymerases. b, Structure of the triazole linkage formed by CuAAC between the nascent-RNA terminally labeled with 3’-(O-Propargyl)-NTPs and the azide-labeled DNA (top left), the linkage formed by SPAAC between the nascent-RNA terminally labeled with 3’-Azido-3’-dNTPs and DBCO DNA (right). The phosphodiester linkage in a native oligonucleotide is shown for comparison (bottom left). c, Incorporation efficiency of 3’-(O-Propargyl)-ATP or 3’-Azido-3’-dATP by native RNA polymerase in nuclear run-on reaction. The propargyl or azide labeled nascent RNA is clicked with Cy5 via CuAAC (Azide-Cy5 or Alkyne-Cy5) or SPAAC (DBCO-Cy5), resolved in a denaturing polyacrylamide gel electrophoresis (PAGE), and quantified by measuring the Cy5 fluorescent from the gel image. The blue dotted line represents the gel region that was quantified. d, Relative quantification of reverse transcription (RT) efficiency of two commercial enzymes traversing through the triazole link formed between the alkyne-labeled RNA and azide-labeled DNA by CuAAC. RT was performed in the presence of either native dCTP or radioisotope a-³²P dCTP, and the RT reaction was resolved in denaturing PAGE and imaged sequentially for nucleic acid signal (top gel) and radioisotope signal (bottom gel). e, Quantification of aborted intermediate and completed desired products (RT through triazole and TSO used) formed during the one hour or three hours of RT using TSO with terminal Locked-Nucleic-Acid-Guanosine (LG) or 2’-Fluoro-Guanosine (FG). f, Confirmation and relative quantification of CuAAC, RT, and PCR of clicked product formed between the alkyne-labeled RNA and azide-labeled DNA by three commercial Reverse transcriptase enzymes. **Note:** The blue bar, line, or border represents the “winner” condition.

**Extended Data Fig. 2 |. Comparison between AGTuC and PRO-seq.**
a, Representative genome-browser screenshots with two replicates of AGTuC and PRO-seq showing a region in chromosome 15 (left) and a region in chromosome 3 containing the Sox2 gene and its distal enhancer (right) of the mouse genome (mm10). b, Correlation between AGTuC and PRO-seq reads per million sequences in gene bodies (left, n = 19,961) and enhancers (right, n = 12,542). c, Metagene profiles of AGTuC and PRO-seq reads per million per 10 base pair bins around the TSS of genes (left, n = 19,961) and enhancers (right, n = 12,542). The line represents the mean, and the shaded region represents 95% confidence interval. d, Major steps with the approximate time required in AGTuC library preparation.

**Extended Data Fig. 3 |. Optimization of intact nuclei run-on reaction and NGS library preparation steps.**
a, Physical appearance of Trypan Blue stained nuclei under microscope treated with various sarkosyl concentrations. b, Relative quantification of nuclear run-on efficiency with various sarkosyl concentrations. Nascent RNA collected after nuclear run-on reaction with either native CTP or click-compatible 3’O-Propargyl-CTP was clicked with Cy5-azide, resolved in denaturing PAGE, and images for Cy5 fluorescence. c, Effect of different ratios of CuAAC accelerating ligand BTTAA in CuAAC efficiency. RNA-propargyl was clicked with azide-DNA containing Cy5 in the presence of various ratios of BTTAA:CuSO₄, resolved in denaturing PAGE, and images for Cy5 fluorescence. d, Relative quantification of denaturing efficiency of commonly used denaturing agents to release the nascent RNA from RNA polymerase complex. Intact nuclei after run-on with 3’O-Propargyl-NTPs were treated with denaturing agents in the presence of azide-labeled beads and CuAAC reagents, allowing nascent RNA to click with the beads. Beads were stained with RNA-binding dye and measured for fluorescence by FACS. e, Effect of denaturing agent’s presence in CuAAC efficiency. The blue outline in the image of denaturing PAGE denotes the click product between the RNA-alkyne and azide-DNA. f, Role of urea in the residence of clicked RNA-DNA conjugate in either supernatant or interphase of Trizol during the clean-up of CuAAC reaction, as quantified by the scintillation count of ³²P radioisotope. g, Desalting (removal of CuSO4, BTTAA, and sodium ascorbate from CuAAC reaction) efficiency of polymerized dextran and cellulose membrane. Fluorescence from Cy5-labeled RNA-DNA conjugate was measured in elution fractions from columns packed with polymerized dextran and elution from different pore-size cellulose membrane centrifugation tubes with or without PEG 8000. h, Relative recovery of ssDNA or RNA from phenol:chloroform or silica-based matrix column purification. Clicked RNA-DNA conjugate was radioisotope labeled using Polynucleotide kinase and γ-³²P ATP, and the cleaned reaction was quantified using a scintillation counter. i, PCR amplification efficiency of clicked RNA-DNA conjugate using different commercial PCR amplification kits. The PCR reaction was resolved in native PAGE, stained with SYBR Gold, and quantified using ImageJ software. j, Relative recovery of size-selected dsDNA. A mock NGS library (purified PCR product) was selected for the desired size using various size-selection methods, and the recovered dsDNA was quantified using a dsDNA-specific fluorescence kit (Qubit). **Note:** The blue bar, line, or border represents the “winner” condition.

**Extended Data Fig. 4 |. Benchmarking inAGTuc against AGTuC and PRO-seq.**
a, Representative genome-browser screenshots with two replicates of inAGTuC, AGTuC, and PRO-seq showing a region in chromosome 8 (left) and a region in chromosome 4 of the mouse genome (mm10). **b, c,** Comparison of inAGTuC metagene profiles with PRO-seq and AGTuC using reads per million per 10 base pair bins around **(b)** the TSS of genes (n = 19,961) and **(c)** enhancers (n = 12,542). The line represents the mean, and the shaded region represents 95% confidence interval. d, Correlations of inAGTuC reads per million sequences in gene bodies (n = 19,961) between the four replicates. e, Distribution of reads per well (left) and features per well (right) in four replicates of 96-well plate inAGTuC libraries. Each well contains 100 nuclei. f, Relationship between the reads per well and the number of features detected per well in four replicates of 96-well plate inAGTuC libraries. **g, h**, Correlation between inAGTuC and AGTuC (two left panels) or PRO-seq (two right panels) reads per million sequences in **(g)** the body of genes (n = 19,961) and **(h)** enhancers (n = 12,542).

**Extended Data Fig. 5 |. Feasibility demonstration of inAGTuC with fewer cells.**
a, Representative genome-browser screenshots of inAGTuC library from one 96-well plate with each well containing either 12, 120, or 1200 cells per well (cpw) showing two regions in chromosome 17 of the mouse genome (mm10). inAGTuC library with 1200 cpw but without Cu(I) in the CuAAC reaction (fourth track) and the PRO-seq library (fifth track) serve as the negative and positive control, respectively. b, Correlations among 12 cpw, 120 cpw, and 1200 cpw inAGTuC libraries in the body of genes (n = 19,961). c, Distribution of reads per well (top) and genes per well (bottom) in 12 cpw, 120 cpw, and 1200 cpw inAGTuC libraries. d, Correlations between PRO-seq and 12 cpw, 120 cpw, and 1200 cpw inAGTuC libraries in the body of genes (n = 19,961).

**Extended Data Fig. 6 |. scGRO-seq library preparation.**
Major steps involved in scGRO-seq library preparation.

**Extended Data Fig. 7 |. Benchmarking scGRO-seq.**
a, Coefficient of determination (r²) between each 96-well plate from scGRO-seq batches that passed the quality-control threshold. r² was calculated from average reads per 96 cells in all genes and enhancers. b, Relationship between the number of features detected per cell and the scGRO-seq reads per cell (left), or scGRO-seq reads in features per cell (right). c, Correlation between scGRO-seq and PRO-seq reads per million sequences in gene bodies (top, n = 19,961) and enhancers (bottom, n = 12,542). d, Comparison of metagene profiles between scGRO-seq and PRO-seq reads per million per 10 base pair bins around the TSS of genes (left, n = 19,961) and enhancers (right, n = 12,542).

**Extended Data Fig. 8 |. Effect of transcription level, gene length, and burst duration in transcription burst kinetics.**
a, Distribution of distances between consecutive RNA polymerases in the first 10 kb of the gene body in single cells compared with distances from permuted data (randomized cell ID but unchanged read position, left) or uniform data (randomized read position along the gene but unchanged cell ID, right). Distances up to 2.5 kb are shown. b, Test of our burst kinetics model by simulating burst size and burst frequency. c, Correlation between the burst frequency from scGO-seq (left) and intron seqFISH (right) with the burst frequency from scRNA-seq. d, Test of our burst kinetics model by simulation using inferred burst size and burst frequency. e, Correlation between the burst size greater than one and the burst frequency of genes. f, The effect of gene length (from 100 bp to 10 kb after trimming 500 bp on either end of the genes) on burst size and frequency. g, Correlation between burst frequencies calculated from the burst window of either the first 5 kb or the first 10 kb gene bodies.

**Extended Data Fig. 9 |. Co-transcription of genes with shared promoters.**
a, Genome-browser screenshot of the histone locus body in mouse chromosome 13 showing transcription of replication-dependent histone genes. b, Network of enriched gene ontology terms in co-transcribed genes. A connecting gray line represents at least a 10% overlap of genes between the GO terms. The color of the dots represents the p-value, and the dot size represents the number of contributing genes in the GO term.

**Extended Data Fig. 10 |. Organization of enhancer-gene co-transcription networks.**
a, Distance between correlated and non-correlated SE-gene pairs within 2.5 Mb of each other. b, Co-transcription network of functionally related genes clustered together on the same chromosome shown as examples. Red edges between the enhancer-gene pairs indicate rho > 0.15, and rho > 0.1 and < 0.15 are shown in gray. c, Co-transcription between the Sox2 gene and its distal enhancer (left), and the Nanog gene and its three enhancers (right). Green bars represent the annotated SE regions, and the 5 kb bins in sense and antisense strands are represented in magenta and yellow-green bars.

**Fig. 1 |. Single-cell nascent RNA sequencing depicts genome-wide nascent transcription.**
a, A summary of scGRO-seq workflow. b, Representative genome-browser screenshots showing scGRO-seq reads at a single-cell resolution, the aggregate scGRO-seq level, inAGTuC, PRO-seq, and chromatin marks around a gene (left) and an enhancer (right). c, Distribution of scGRO-seq reads per cell (left) and features per cell (right). d, Correlation between aggregate scGRO-seq and inAGTuC reads per million sequences in the body of genes (left, n = 19,961) and enhancers (right, n = 12,542). e, Metagene profiles of scGRO-seq compared with inAGTuC reads per million per 10 base pair bins around the TSS of genes (left, n = 19,961) and enhancers (right, n = 12,542). The line represents the mean, and the shaded region represents the 95% confidence interval. f, Correlation between scGRO-seq and intron seqFISH reads per cell in the body of genes used in intron seqFISH (n = 9,666). g, Correlation between scGRO-seq and scRNA-seq reads per cell in the body of genes (left, n = 19,961) and the distribution of reads in various genomic regions (right).

**Fig. 2 |. Inference of transcription kinetics using scGRO-seq.**
a, Single-cell view of multiplet RNA Polymerase (blue dots) in *Npm1* gene. A yellow line connects RNA polymerases within the same cell. Randomly sampled 75 single cells containing more than one RNA Polymerase are shown on top, followed by the aggregate scGRO-seq, PRO-seq, chromatin marks, and transcription-associated factors profiles. b, Ratio of observed or permuted burst sizes compared against the average burst sizes from 200 permutations. c, Ratio of the observed distance between consecutive RNA polymerases in the first 10 kb of gene-bodies in individual cells against the permuted data. Distances up to 2.5 kb are shown in 50 bp bins. d, Illustration of the model for direct inference of burst kinetics from scGRO-seq data. e, Histogram of burst size (left), burst frequency (middle), and duration until the next burst (1/burst frequency) (right) for genes that are at least 10 kb long (n = 13,142). f, Correlation of burst frequency of genes between scGRO-seq and intron seqFISH. g, Effect of promoter elements in burst size greater than 1 (left) and burst frequency (right). Inr is Initiator, and PB is pause button sequences. h, Role of transcription factors in determining burst frequency and burst size.

**Fig. 3 |. Cell-cycle inference by non-polyadenylated replication-dependent histone genes expression.**
a, Heatmap of hierarchical clustering of single cells representing transcription of G1/S, S, and G2/M specific genes. The dendrogram colors represent cell clusters with cell-cycle phase-specific gene transcription. b, Fraction of cells in the three primary clusters distinguished by transcription of G1/S, S, and G2/M specific genes. c, Distribution of scGRO-seq reads per cell in the three clusters of cells defined by cell-cycle phase-specific gene transcription. d, Differentially expressed genes among the three clusters of cells defined by transcription of G1/S, S, and G2/M specific genes. The genes used to classify cells are denoted in bold and colored. Histones (RD) represent aggregate reads from replication-dependent histone genes.

**Fig. 4 |. Coordinated transcription of functionally related genes.**
a, A pair of co-transcribed genes (top). Reads within the first 10 kb of the gene pair (blue circle) expressed in the same cells are connected by a yellow line. Reads beyond the first 10 kb (gray circles and lines) are not used in gene-gene correlation. Pair-wise Pearson correlation was calculated from a binarized genes by cells matrix. The relationship between the Pearson correlation coefficient, uncorrected p-value, and the false discovery rate corrected p-value for pairwise gene-gene correlation is shown (bottom). b, Gene ontology terms enriched in co-transcribed gene modules. The transcription factor motif enriched in the promoters of the genes associated with the GO term and the co-transcribed genes that contributed to the enrichment of the GO term is shown as an example on the right (red line indicating rho > 0.15). A complete list of GO terms and the co-transcribed genes contributing to the enrichment of the GO terms is provided in Table 5. c, Correlation of co-transcription of significantly co-transcribed gene pairs (n = 164,380) between scGRO-seq and intron seqFISH. Axes represent the fraction of cells in which a gene pair is co-transcribed.

**Fig. 5 |. Spatial and temporal coordination between genes and enhancers.**
a, Distance between correlated and non-correlated enhancer-gene pairs within 2.5 Mb of each other. b, Co-transcription between pluripotency genes (filled blue arrows indicate sense gene bins, empty blue arrows indicate antisense gene bins) and their enhancers (represented by green arrows, and the arrow directions indicate sense and antisense directions). Correlated full-length enhancer-gene pairs (Sox2 and Nanog) are shown with purple distance bars. For finer time resolution correlation, features are extended up to the end of the transcription signal and divided into 5 kb bins. Correlated bins are represented by a red arch, except for Sox2 and its distal enhancer bins, which are shown in different colors for visual aid. c, Co-transcription between the Sall1 gene and its CRISPR-verified SE. Correlated SE-gene bins are denoted by arches, d, Summary of correlated bin positions in CRISPR-verified SE-gene pairs. Scrambled random pairs serve as a control.

See this image and copyright information in PMC

References

1. Kaikkonen M. U., Lam M. T. Y. & Glass C. K. Non-coding RNAs as regulators of gene expression and epigenetics. Cardiovasc. Res. 90, 430–440 (2011). - PMC - PubMed
1. Kapranov P., Willingham A. T. & Gingeras T. R. Genome-wide transcription and the implications for genomic organization. Nat. Rev. Genet. 8, 413–423 (2007). - PubMed
1. Birney E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). - PMC - PubMed
1. Djebali S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012). - PMC - PubMed
1. Kapranov P. et al. RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription. Science 316, 1484–1488 (2007). - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Single-cell nascent RNA sequencing using click-chemistry unveils coordinated transcription

Affiliations

Single-cell nascent RNA sequencing using click-chemistry unveils coordinated transcription

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources