Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May;17(5):515-523.
doi: 10.1038/s41592-020-0797-9. Epub 2020 Apr 6.

Kethoxal-assisted single-stranded DNA sequencing captures global transcription dynamics and enhancer activity in situ

Affiliations

Kethoxal-assisted single-stranded DNA sequencing captures global transcription dynamics and enhancer activity in situ

Tong Wu et al. Nat Methods. 2020 May.

Erratum in

Abstract

Transcription is a highly dynamic process that generates single-stranded DNA (ssDNA) in the genome as 'transcription bubbles'. Here we describe a kethoxal-assisted single-stranded DNA sequencing (KAS-seq) approach, based on the fast and specific reaction between N3-kethoxal and guanines in ssDNA. KAS-seq allows rapid (within 5 min), sensitive and genome-wide capture and mapping of ssDNA produced by transcriptionally active RNA polymerases or other processes in situ using as few as 1,000 cells. KAS-seq enables definition of a group of enhancers that are single-stranded and enrich unique sequence motifs. These enhancers are associated with specific transcription-factor binding and exhibit more enhancer-promoter interactions than typical enhancers do. Under conditions that inhibit protein condensation, KAS-seq uncovers a rapid release of RNA polymerase II (Pol II) from a group of promoters. KAS-seq thus facilitates fast and accurate analysis of transcription dynamics and enhancer activities simultaneously in both low-input and high-throughput manner.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The University of Chicago has filed a patent application on KAS-seq. C.H. is a scientific founder and a member of the scientific advisory board of Accent Therapeutics, Inc., and a shareholder of Epican Genetech.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. Characterization of the N3-kethoxal-based labeling.
a, MALDI-TOF analysis of the reaction between a 16-mer DNA oligo and N3-kethoxal. The experiment was performed in duplicates with similar results obtained. b, TLC analysis of the reaction between N3-kethoxal and deoxyguanosine (dG, left) or L-arginine (L-Arg, right) after different time intervals. The N3-kethoxal-dG results were visualized by 254 nm UV light. The N3-kethoxal-L-Arg results were visualized by ninhydrin staining. The experiment was performed in duplicates with similar results obtained. c-d, The DNA yield (c) and the A260/280 ratio (d) of gDNA isolated from N3-kethoxal-treated and control cells. P values were calculated by using two-sided unpaired Student’s t-test (n = 3 independent experiments). e, Dot blot showing biotin signals of the DNA after the biotinylation reaction in the presence or absence of N3-kethoxal or biotin-DBCO. Results from two replicates were shown for each condition. The experiment was performed in duplicates with similar results obtained. f, Agarose gel image showing the profile of libraries constructed by using input and enriched DNA samples made in the presence or absence of N3-kethoxal or biotin-DBCO. Results from two replicates were shown for each condition. The experiment was performed in duplicates with similar results obtained.
Extended Data Fig. 2
Extended Data Fig. 2. KAS-seq validation and an overview of the KAS-seq profile.
a, Fingerprint plot of KAS-seq libraries and the corresponding inputs in HEK293T cells. b, Pearson correlation scatterplot between two independent KAS-seq replicates (r = 0.99) in HEK293T cells (n = 287,970 10 Kb bins in the hg19 genome). c, Peak overlaps between two independent KAS-seq replicates in HEK293T cells. The p value was calculated using two-sided Fisher’s exact test. d, Reads distributions of KAS-seq (left) and Pol II ChIP-seq (right) signals respect to different GC fractions. e, Heatmap showing reads distribution of two independent KAS-seq replicates at gene-coding regions in mESCs. f, The distribution of KAS-seq signals, ATAC-seq signals, and selected histone modifications at gene-coding regions in HEK293T cells. g, Heatmap showing the reads distribution of two KMnO4/S1 footprinting replicates (activated mouse B cells) at gene-coding regions.
Extended Data Fig. 3
Extended Data Fig. 3. KAS-seq using low input cells and mouse liver.
KAS-seq signal distribution at gene-coding regions revealed by using different numbers of HEK293T cells (n = 26,910 genes). b, Profiles of KAS-seq data at gene-coding regions using different numbers of HEK293T cells. c, Fingerprint plot of low-input KAS-seq libraries. d. Numbers of KAS-seq peaks detected by using different amounts of HEK293T cells. e, Heatmap showing reads distribution of two independent KAS-seq replicates at gene-coding regions generated by using livers from two mice. 1 M: 1 million; 10 K: 10 thousand; 5 K: 5 thousand; 1 K: 1 thousand.
Extended Data Fig. 4
Extended Data Fig. 4. Correlation between KAS-seq signals, gene expression levels, Pol II dynamics, and gene transcription states.
a, Venn diagram showing the overlap between KAS-seq peaks and Pol II ChIP-seq peaks at promotor in mESCs. The p value was calculated using two-sided Fisher’s exact test. b, Pearson correlation scatterplot (n = 24,359 genes) between KAS-seq and Pol II ChIP-seq at gene bodies in mESCs. The r value was calculated as two-tailed probability. c, Genes were grouped according to different expression levels based on RNA-seq. 10–90 percentile of data points are shown, with the center line showing the median, and the box limits showing the upper and lower quartiles. d, Metagene profile of KAS-seq signals at gene-coding regions under control, DRB treatment, and triptolide treatment conditions. e, a snapshot of KAS-seq profiles from UCSC Genome Browser under control, DRB treatment, and triptolide treatment conditions. f, Heatmaps showing KAS-seq, Pol II ChIP-seq, and GRO-seq signals on genes with four different transcription states defined by using KAS-seq.
Extended Data Fig. 5
Extended Data Fig. 5. KAS-seq shows no significant length-dependent bias and yields strong signals around TES regions.
a, A snapshot from UCSC Genome Browser showing KAS-seq and Pol II ChIP-seq profiles at the native state, and KAS-seq profile at the DRB-treated state, indicating that KAS-seq signals around TES are derived from Pol II. Autoscale setting is used for all tracks. b, KAS-seq reads densities of three groups of genes with different lengths of termination signals. c, Averaged KAS-seq reads density in the entire terminal regions in the three groups of genes defined in (b). n = 660 genes for all three groups. d, Termination index for each gene was calculated as the ratio of KAS-seq reads density on TES to its downstream 2 kb region, versus reads density on the – 200 bp to +400 bp region around TSS. e, The distribution of termination index for all genes in KAS-seq, GRO-seq, and Pol II ChIP-seq (n = 29,160 genes). For c and e, 10 – 90 percentile of data points are shown, with the center line showing the median, and the box limits showing the upper and lower quartiles. P values were calculated using two-sided unpaired Student’s t-test.
Extended Data Fig. 6
Extended Data Fig. 6. KAS-seq detects Pol I and Pol III-mediated transcription events, as well as other non-B form DNA structures and telomeric DNA regions.
ac, Snapshots of KAS-seq signals at selected small RNA, tRNA, and rRNA loci in HEK293T cells under native, DRB treatment, and triptolide treatment conditions. d, A summary of different types of non-B form DNA structures and the number of KAS-seq peaks (under triptolide-treatment condition) detected at each type of predicted non-B form DNA regions. e, Snapshots from UCSC genome browser showing examples of KAS-seq signals under native, DRB, and triptolide-treatment conditions at different non-B form DNA regions and telomeric DNA regions. f, Enrichment of KAS-seq signals at different non-B form DNA and telomeric DNA regions showed in (d). n = 715 regions for hairpin, n = 1,643 regions for cruciform, n = 730 regions for H-DNA, n = 356 regions for quadruplex, n = 256 regions for Z-DNA, n = 29 regions for telomere.
Extended Data Fig. 7
Extended Data Fig. 7. Features of ssDNA-containing enhancers in mESCs.
a, All ATAC-seq-positive enhancers were sorted into two groups based on whether they are KAS-seq-positive or not. Heatmaps of KAS-seq, ATAC-seq, and Pol II ChIP-seq signals on these two groups are shown. b, A metagene profile showing ATAC-seq reads density on the two groups of enhancers defined in (a). c, Expression levels of genes associated with KAS-seq positive (n = 3,080 genes) and KAS-seq negative (n = 1,544 genes) enhancers defined in (a). 10 – 90 percentile of data points are shown, with the centerline showing the median, and the box limits showing the upper and lower quartiles. The p value was calculated using two-sided unpaired Student’s t-test. d, Sequence motifs enriched in ATAC-seq-positive but KAS-seq-negative enhancers from mESCs (n = 6,082 enhancers). The p values were calculated by two-sided binomial test. e, Metagene profiles of Nanog, Oct4 and Sox2 ChIP-seq read densities at denoted enhancers in mESCs. Regions within 10 kb around the enhancer centers are shown.
Extended Data Fig. 8
Extended Data Fig. 8. ssDNA-containing enhancers in HEK293T cells.
a, A group of enhancers are single-stranded in HEK293T cells. Heatmap of KAS-seq reads densities at all enhancer regions in HEK293T cells. Active and poised enhancer regions are defined by distal H3K27ac and H3K4me1 signals. Active enhancers are sub-grouped into SSEs and DSAEs. b, Distribution of H3K27ac ChIP-seq signal across all HEK293T enhancers. Super-enhancers are defined as containing exceptionally high amounts of H3K27ac. c, The number of ssDNA-containing enhancers and super-enhancers in HEK293T cells and the overlap. The p value was calculated by two-sided Fisher’s exact test. d, KAS-seq reads densities on SSEs in HEK293T cells under native and DRB-treatment conditions. e, Metagene profiles of KAS-seq, Pol II, H3K4me3, and H3K27ac ChIP-seq reads densities at denoted enhancers in HEK293T cells. Regions within 10 kb around the enhancer centers are shown. SSE: ssDNA-containing enhancers; DSAE: double-stranded active enhancers; PE: poised enhancers.
Extended Data Fig. 9
Extended Data Fig. 9. Transcription factors that preferentially bind at ssDNA-containing enhancers in HEK293T cells.
a, Metagene profiles of CTCF, YY1, SP1, SP2, MAZ, NCAPH2, KLF8, KLF9, ZNF335, ZNF341, ZBTB20, and ZBTB26 ChIP-seq reads densities at denoted enhancers in HEK293T cells. Regions within 10 kb around the enhancer centers are shown. b, Transcription factor binding motifs enriched at ssDNA-containing enhancers (n = 1,969 enhancers) in HEK293T cells with corresponding p values by using the genome as background. Only TFs with motif information in the TRANSFAC vertebrates library were analyzed. P values were calculated by two-sided binomial test. c, GREAT analysis of genes regulated by ssDNA-containing enhancers (n = 1,969 enhancers) in HEK293T cells. P values were calculated by two-sided binomial test. SSE: ssDNA-containing enhancers; DSAE: double-stranded active enhancers; PE: poised enhancers.
Extended Data Fig. 10
Extended Data Fig. 10. KAS-seq and Pol II ChIP-seq signals in response to protein condensation inhibition.
a, PCA analysis of KAS-seq data at different time points after 1,6-hexanediol treatment (n = 3,122,843 1 kb bins). b, Box plots showing normalized KAS-seq reads densities on gene bodies (from 0.5 kb downstream TSS to TES) of the genes defined as responsive to 1,6-hexanediol treatment. 10–90 percentile of data points are shown, with the center line showing the median, and the box limits showing the upper and lower quartiles. P values were calculated by using two-sided unpaired Student’s t-test. c, Heat map showing the release and movement of KAS-seq signals (left) and Pol II clusters (right) from 0 min to 60 min after 1,6-hexanediol treatment. d, Numbers of fast responsive genes defined by KAS-seq and Pol II ChIP-seq, and the overlap. The p value was calculated by two-sided Fisher’s exact test.
Fig. 1 |
Fig. 1 |. Probing single-stranded DNA regions in the genome by using KAS-seq.
a, The molecular structure of N3-kethoxal and how N3-kethoxal labels guanines in single-stranded DNA but not in double-stranded DNA. b, The scheme of KAS-seq. N3-kethoxal (blue star) reacts with single-stranded guanines in the genome (resolved by DNA-binding proteins, such as Pol II as shown in yellow), which can be further biotinylated (red) and enriched for sequencing. The whole process takes 6–7 h in total, from live cell labeling to finish library preparation.
Fig. 2 |
Fig. 2 |. An overview of KAS-seq in HEK293T cells.
a, Genome-wide distribution of KAS-seq peaks. “KAS-seq” denotes the percentage overlap of KAS-seq peaks with different genomic features. “Random” denotes the percentage overlap of randomly generated regions with the same number and length of real peaks with different genomic features. b, The distribution of KAS-seq signals at gene-coding regions, with 3 kb upstream of TSS and 3 kb downstream of TES shown. c, The genome-wide Pearson correlation heatmap among averaged KAS-seq signals, selected histone modifications, and ATAC-seq reads density in HEK293T cells. Heatmap was clustered using hierarchical clustering, with pairwise correlation coefficients noted in each square (n = 302,755 10 kb bins in the hg19 genome). d, A snapshot from UCSC Genome Browser, showing the relationship between KAS-seq peaks, selected histone modifications, and ATAC-seq peaks at a highlighted locus.
Fig. 3 |
Fig. 3 |. KAS-seq reveals Pol II dynamics and defines gene transcription states.
a, Genome-wide Pearson correlation heatmap between KAS-seq, Pol II ChIP-seq, GRO-seq, and nascent RNA-seq (4SU-seq) reads density on gene-coding regions in HEK293T cells. Pairwise correlation coefficients are noted in each square (n = 839,684 1 kb bins in the hg19 genome). b, KAS-seq reads density at gene-coding regions of genes with different expression levels (defined by RNA-seq) in HEK293T cells. c, Venn diagram showing overlap of KAS-seq peaks in HEK293T cells under native, DRB treatment, and triptolide treatment conditions. The number of common peaks between two replicates was used in each case. d, Heatmap showing KAS-seq signal distribution at gene-coding regions under native, DRB treatment, and triptolide treatment conditions. Regions of 3 kb upstream of TSS and 3 kb downstream of TES were shown. e, Defining four groups of genes with different transcription states based on KAS-seq results. In each group, one gene is shown as an example by using the snapshot of KAS-seq signals under native and DRB-treated conditions.
Fig. 4 |
Fig. 4 |. A portion of enhancers exist as single-stranded, which possess higher enhancer activity and are associated with critical functions.
a, Heatmap of KAS-seq reads density at all enhancer regions in mESCs. Active and poised enhancer regions are defined by distal H3K27ac and H3K4me1 signals. Active enhancers are sub-grouped into SSEs and DSAEs. More than 40% of active enhancers (25% of all enhancers) are single-stranded. b, Snapshots of HEK293T KAS-seq under the native condition and H3K27ac signals from UCSC Genome Browser, showing examples that the entire enhancer is single-stranded, a part of the enhancer is single-stranded, or the entire enhancer is not single-stranded, respectively. c, KAS-seq reads densities on ssDNA-containing enhancers in mESCs under native and DRB-treatment conditions. d, The numbers of ssDNA-containing enhancers and super-enhancers in mESCs and their overlap. The p value was calculated using two-sided Fisher’s exact test. e, Boxplot showing the expression levels of genes regulated by denoted enhancers. 10 – 90 percentile of data points are shown, with the center line showing the median, and the box limits showing the upper and lower quartiles. P values were calculated using two-sided unpaired Student’s t-test (n = 617 genes for SEs, n = 3,262 genes for SSEs, n = 3,367 genes for other enhancers). f, ssDNA-containing enhancers possess more long range interactions mediated by both Pol II and CTCF than those from double-stranded active enhancers. Both Pol II-mediated and CTCF-mediated long-range interactions were defined from public ChIA-PET data in mESCs. g, Sequence motifs enriched in ssDNA-containing enhancers in mESCs. P-values were calculated by using two-sided binomial test (n = 786 SSEs). h, Metagene profiles of KAS-seq (DRB), Pol II, H3K4me3, H3K27ac, Brd4, Med1, Cdk8, and Cdk9 ChIP-seq reads densities at denoted enhancers in mESCs. Regions within 10 kb around the enhancer centers are shown. i, GREAT analysis of genes regulated by ssDNA-containing enhancers in mESCs. P-values were calculated by using two-sided binomial test (n = 786 SSEs). SSE: ssDNA-containing enhancers; DSAE: double-stranded active enhancers; PE: poised enhancers.
Fig. 5 |
Fig. 5 |. KAS-seq reveals transcription dynamics upon inhibition of protein condensation.
a,b, KAS-seq read densities around TSS on uni-directional (a) and bi-directional (b) transcribed genes after HEK293T cells were treated with 1,6-hexanediol for denoted time intervals. Arrows indicate the upstream and downstream “released” KAS-seq signals at the 5 min time point. c, Snapshots of KAS-seq and Pol II ChIP-seq signals on the BAIAP2 gene after cells were treated with 1,6-hexanediol for denoted time intervals. Snapshots at different time points for each data set are staggered to clearly show differences. Autoscale setting was used for all tracks. The genomic coordinates and the Refseq tracks are aligned to the 60 min time point. d, Pol II ChIP-seq read densities around TSS after cells were treated by 1,6-hexanediol for denoted time intervals. e, Boxplot showing the calculated release index of high (n = 1,730 genes), medium (n =1,730 genes), low (n = 1,730 genes) and non-responsive (n = 1,188 genes) genes. f, Pol II CTD S5P densities on four groups of genes that respond to 1,6-hexanediol to different extents. g, Boxplot showing the ratio of Pol II S5P over total Pol II on TSS in four groups of genes with different strength of responses to 1,6-hexanediol (n = 1,730 for high responsive genes, n = 1,730 for medium responsive genes, n = 1,730 for low responsive genes, and n = 1,188 for non-responsive genes). For e and g, 10 – 90 percentile of data points are shown, with the center line showing the median, and the box limits showing the upper and lower quartiles. P values were calculated using two-sided unpaired Student’s t-test.

References

    1. Schwanhäusser B et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011). - PubMed
    1. Kim T-K et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010). - PMC - PubMed
    1. Core LJ, Waterfall JJ & Lis JT Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters. Science 322, 1845–1848 (2008). - PMC - PubMed
    1. Kwak H, Fuda NJ, Core LJ & Lis JT Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing. Science 339, 950–953 (2013). - PMC - PubMed
    1. Fuchs G et al. 4sUDRB-seq: measuring genomewide transcriptional elongation rates and initiation frequencies within cells. Genome Biology 15, R69 (2014). - PMC - PubMed

Publication types