Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 1;37(9-10):432-448.
doi: 10.1101/gad.350434.123. Epub 2023 May 10.

Butt-seq: a new method for facile profiling of transcription

Affiliations

Butt-seq: a new method for facile profiling of transcription

Albert D Yu et al. Genes Dev. .

Abstract

A wide range of sequencing methods has been developed to assess nascent RNA transcription and resolve the single-nucleotide position of RNA polymerase genome-wide. These techniques are often burdened with high input material requirements and lengthy protocols. We leveraged the template-switching properties of thermostable group II intron reverse transcriptase (TGIRT) and developed Butt-seq (bulk analysis of nascent transcript termini sequencing), which can produce libraries from purified nascent RNA in 6 h and from as few as 10,000 cells-an improvement of at least 10-fold over existing techniques. Butt-seq shows that inhibition of the superelongation complex (SEC) causes promoter-proximal pausing to move upstream in a fashion correlated with subnucleosomal fragments. To address transcriptional regulation in a tissue, Butt-seq was used to measure the circadian regulation of transcription from fly heads. All the results indicate that Butt-seq is a simple and powerful technique to analyze transcription at a high level of resolution.

Keywords: RNA polymerase II pausing; circadian rhythms; nascent RNA; superelongation complex; transcriptional profiling; transcriptional regulation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Butt-seq measures transcription and is comparable with other transcription analysis methods. (A) Butt-seq method overview. The figure was created with BioRender. (B) Scaled metagene plots of signal distribution in plus-strand genes >5 kb in length (N = 6284) across Butt-seq, 3NT-seq, PRO-seq, and RNAPII ChIP-seq. On the X-axis, the transcription start site (TSS) is marked, and the plot extends 1 kb into the gene body and 200 bp upstream. The Y-axis represents the log2 transformed normalized signal in each method. The shaded area corresponds to the 95% confidence interval. (C) Log–log plot of normalized counts from Butt-seq compared with 3NT-seq, PRO-seq, and RNAPII ChIP-seq genome-wide (N = 10,744). Signal was quantified from the region 200 nt downstream from the TSS genome-wide. The Y-axis represents the log2 transformed Butt-seq counts, while the X-axis represents the log2 transformed 3NT-seq (left), PRO-seq (middle), and RNAPII ChIP-seq (right) counts. Correlations were calculated using Spearman's rank correlation coefficient. (D) Log–log plot of normalized counts from Butt-seq compared with 3NT-seq, PRO-seq, and RNAPII ChIP-seq genome-wide (N = 10,744). Signal was quantified from the region +200 nt to +2000 nt downstream from the TSS. The Y-axis represents the log2 transformed Butt-seq counts, while the X-axis represents the log2 transformed 3NT-seq (left), PRO-seq (middle), and RNAPII ChIP-seq (right) counts. Correlations were calculated using Spearman's rank correlation coefficient.
Figure 2.
Figure 2.
Butt-seq signal is reproducible down to 10,000 cells. (A) Comparison of signal from Butt-seq data generated from 500,000, 50,000, and 10,000 cells. Signal was quantified from 200 nt downstream from the TSS genome-wide. The Y-axis represents the log2 transformed counts from 500,000 cells, while the X-axis represents the log2 transformed counts from 50,000 (top) and 10,000 (bottom) cells. (B) Metagene plots of signal distribution in plus-strand genes >5 kb in length (N = 6284) from 500,000, 50,000, and 10,000 cells. On the X-axis, the transcription start site (TSS) is marked, and the plot extends 1 kb into the gene body and 200 bp upstream. The Y-axis represents the log2 transformed normalized signal in each method. Data were normalized to the depicted region The shaded area corresponds to the 95% confidence interval.
Figure 3.
Figure 3.
Butt-seq recapitulates pausing dynamics seen in RNAPII ChIP-seq upon KL-1 treatment. (A) ECDF of the pausing index in Butt-seq in S2 cells treated with DMSO or 20 µM KL-1. The pausing region is defined as the region from the TSS to the highest pause site as determined by pause detection algorithm (PDA), and the gene body region is defined as 1000 nt downstream from each pause site. The Y-axis depicts the cumulative fraction of genes, while the X-axis contains the log2 transformed pausing index. (B) Representative gene demonstrating the effect of 20 µM KL-1 treatment on signal distribution in Butt-seq and RNAPII ChIP-seq. (Green arrow) Untreated pause site, (purple arrow) KL-1-induced pause site. (C) Metagene plots of log2-normalized Butt-seq and RNAPII ChIP-seq counts, centered around RNAPII ChIP-seq peaks called from DMSO-treated S2 cells that overlap with Butt-seq single-nucleotide pause peaks. Only plus-strand genes are depicted (N = 219). The shaded region represents the 95% confidence interval.
Figure 4.
Figure 4.
Pausing as assayed by Butt-seq is correlated with nucleosomal dynamics. (A) Metagene plots of log2 transformed Butt-seq counts compared with log2-normalized MNase-seq counts, restricted to 58-bp ± 5-nt fragments from MNase-seq. Plotted genes were derived from MNase-seq peaks that overlap with annotated TSS; plus-strand genes are shown (N = 831). The shaded area corresponds to the 95% confidence interval. (B) Metagene plot of log2 transformed H2Av ChIP-seq counts compared with log2 transformed Butt-seq counts from S2 cells treated with DMSO or 20 µM KL-1. Plotted genes were derived from H2Av ChIP-seq peaks that overlap with annotated TSS; plus-strand genes are shown (N = 1531). The shaded area corresponds to the 95% confidence interval. (C) Clustering of pause peaks based on distance from the −1 nucleosome center. Depicted are plus-strand genes. Cluster 1 contains pauses 80–120 bp downstream (N = 140), cluster 2 contains pauses 60–79 bp downstream (N = 95), cluster 3 contains pauses 40–59 bp downstream (N = 63), cluster 4 contains pauses 20–39 bp downstream (N = 45), and cluster 5 contains pauses 0–19 bp downstream (N = 35). The shaded area corresponds to the 95% confidence interval. (D) Metagene plot of log2-normalized 58-bp ± 5-bp MNase-seq counts against log2-normalized KL-1- and DMSO-treated Butt-seq counts across five clusters. The shaded area corresponds to the 95% confidence interval.
Figure 5.
Figure 5.
Despite low correlation in gene expression, S2 cells and heads share similar pausing features in coexpressed genes. (A) Comparison of counts from Butt-seq data generated from S2 cells or heads. Signal was quantified from 200 nt downstream from the TSS genome-wide. The Y-axis represents the log2 transformed counts from heads, while the X-axis represents the log2 transformed counts from S2 cells. (B) Representative genome browser view of genes exhibiting similar pause sites in heads and S2 cells. (Green arrow) Representative shared pause sites.
Figure 6.
Figure 6.
Differential analysis of Butt-seq in heads and S2 cells reveals a diverse range of transcriptional programs. Metagene analysis (left) and representative genes (right) reflecting different combinations of pause region and gene body signal between S2 cells and heads. (A) Pausing is the same between heads and S2 cells. (B) Pausing is higher in S2 cells. (C) Pausing is higher in heads. (D) Total number of genes occupying each pause/gene body combination.
Figure 7.
Figure 7.
Butt-seq exhibits transcriptional cycling of core circadian genes. Genome browser view of six time points of core circadian genes in Butt-seq. (Green arrow) Cycling pause, (purple arrow) constant pause.
Figure 8.
Figure 8.
Butt-seq recapitulates known transcriptional features of the circadian clock. Butt-seq double-plotted against RNA-seq data from Rodriguez et al. (2013) and Kuintzle et al. (2017) at core circadian genes. Each time point was normalized to the peak time point in respective genes. (Green arrow) “Hump” of transcription in per gene.

References

    1. Abruzzi KC, Rodriguez J, Menet JS, Desrochers J, Zadina A, Luo W, Tkachev S, Rosbash M. 2011. Drosophila CLOCK target gene characterization: implications for circadian tissue-specific gene expression. Genes Dev 25: 2374–2386. 10.1101/gad.178079.111 - DOI - PMC - PubMed
    1. Adelman K, Lis JT. 2012. Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat Rev Genet 13: 720–731. 10.1038/nrg3293 - DOI - PMC - PubMed
    1. Begik O, Diensthuber G, Liu H, Delgado-Tejedor A, Kontur C, Niazi AM, Valen E, Giraldez AJ, Beaudoin JD, Mattick JS, et al. 2023. Nano3P-seq: transcriptome-wide analysis of gene expression and tail dynamics using end-capture nanopore cDNA sequencing. Nat Methods 20: 75–85. 10.1038/s41592-022-01714-w - DOI - PMC - PubMed
    1. Breese MR, Liu Y. 2013. NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets. Bioinformatics 29: 494–496. 10.1093/bioinformatics/bts731 - DOI - PMC - PubMed
    1. Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34: i884–i890. 10.1093/bioinformatics/bty560 - DOI - PMC - PubMed

MeSH terms