. 2022 Jul 27;4(3):lqac054.

doi: 10.1093/nargab/lqac054. eCollection 2022 Sep.

DSIF modulates RNA polymerase II occupancy according to template G + C content

Ning Deng¹, Yue Zhang¹, Zhihai Ma¹, Richard Lin¹, Tzu-Hao Cheng², Hua Tang¹, Michael P Snyder¹, Stanley N Cohen¹

Affiliations

¹ Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA.
² Institute of Biochemistry and Molecular Biology, National Yang Ming Chiao Tung University, Taipei 112, Taiwan.

PMID: 35910045
PMCID: PMC9326580
DOI: 10.1093/nargab/lqac054

DSIF modulates RNA polymerase II occupancy according to template G + C content

Ning Deng et al. NAR Genom Bioinform. 2022.

. 2022 Jul 27;4(3):lqac054.

doi: 10.1093/nargab/lqac054. eCollection 2022 Sep.

Authors

Ning Deng¹, Yue Zhang¹, Zhihai Ma¹, Richard Lin¹, Tzu-Hao Cheng², Hua Tang¹, Michael P Snyder¹, Stanley N Cohen¹

Affiliations

¹ Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA.
² Institute of Biochemistry and Molecular Biology, National Yang Ming Chiao Tung University, Taipei 112, Taiwan.

PMID: 35910045
PMCID: PMC9326580
DOI: 10.1093/nargab/lqac054

Abstract

The DSIF complex comprising the Supt4h and Supt5h transcription elongation proteins clamps RNA polymerase II (RNAPII) onto DNA templates, facilitating polymerase processivity. Lowering DSIF components can differentially decrease expression of alleles containing nucleotide repeat expansions, suggesting that RNAPII transit through repeat expansions is dependent on DSIF functions. To globally identify sequence features that affect dependence of the polymerase on DSIF in human cells, we used ultra-deep ChIP-seq analysis and RNA-seq to investigate and quantify the genome-wide effects of Supt4h loss on template occupancy and transcript production. Our results indicate that RNAPII dependence on Supt4h varies according to G + C content. Effects of DSIF knockdown were prominent during transcription of sequences high in G + C but minimal for sequences low in G + C and were particularly evident for G + C-rich segments of long genes. Reanalysis of previously published ChIP-seq data obtained from mouse cells showed similar effects of template G + C composition on Supt5h actions. Our evidence that DSIF dependency varies globally in different template regions according to template sequence composition suggests that G + C content may have a role in the selectivity of Supt4h knockdown and Supt5h knockdown during transcription of gene alleles containing expansions of G + C-rich repeats.

PubMed Disclaimer

Figures

**Figure 1.**
Effects of Supt4h reduction on DNA occupancy by elongating RNAPII are associated with G + C content of template. (A) DNA occupancy by elongating RNAPII was quantified by FPKM using ChIP-seq RNAPII-S2 data. RNAPII-S2 ChIP-seq FPKM (y-axis) of all 8779 genes with ChIP-seq FPKM > 0.5 were plotted with their G + C content (x-axis). Linear trendlines were added for Untreated (UNT) and Supt4h knockdown (Supt4-KD) conditions via Microsoft Excel. The equation of the trendline was displayed at the end of each trendline. (B) The effects of Supt4h reduction on DNA occupancy by elongating RNAPII were expressed as ESKOR values. Genes in which the calculated FPKM was >0.5 (8779 genes in total) were sorted according to their ESKOR value (y-axis) from high to low (upper panel). Using the same gene ranking (i.e. same x-axis as the upper panel), each gene's G + C content was plotted on the y-axis in the lower panel. The yellow dots represent the moving average of 100 neighboring genes (lower panel). (C) Heatmap analysis of G + C distribution in the first 10 kb of genes, sorted from high ESKOR to low ESKOR. Genes shorter than 10 kb were not included in the analysis. The first 10 kb of the gene body, including exons and introns, was scanned using a 100 bp sliding window starting at the TSS. The average G + C% for each sliding window was calculated along the gene for heatmap plotting. The G + C% is displayed by differing color as indicated.

**Figure 2.**
Comparison of the effects of Supt4h knockdown on high G + C and low G + C regions. Each of the 8779 genes identified in this analysis was computationally divided into segments of 500 bp in length, starting at a location 1 kb 3′ from the TSS. Segments having highest G + C and lowest G + C content of each gene were chosen, and read counts per million mapped reads were determined in the presence or absence of Supt4h knockdown. Read counts for these 500 bp segments were further divided into 25 bp bins and plotted (y-axis). The bars indicate the mean of the read count per million mapped reads, as determined by ChIP-seq analysis. The error bar represents the standard error of the mean.

**Figure 3.**
Comparison of the effects of Supt4h reduction on RNAPII-S2 occupancy of high ESKOR genes versus low ESKOR genes. 8779 genes having FPKMs >0.5 were sorted according to their ESKOR value (y-axis), and the 2000 highest ESKOR genes and 2000 lowest ESKOR genes were selected. For each of these genes, the gene body was divided, independently of gene length, into 100 bins. The normalized and aggregated read counts from the 100 bins for the gene bodies of each of the 2000 genes are shown for untreated cells and cells in which Supt4h has been knocked down in ngs.plot (62). Data for 1000 bp regions flanking each gene body are also shown to indicate read counts in promoter regions 5′ to transcription start sites and read counts for regions 3′ to transcription termination sites.

**Figure 4.**
Association of gene length with ESKOR. (A) The 8779 genes were sorted into four groups according to gene length (5 kb ≤ group1 < 15 kb; 15 kb ≤ group 2 < 30 kb, 30 kb ≤ group 3 < 70 kb; 70 kb ≤ group 4 < 230 kb). Genes in each group were further sorted according ESKOR (high to low), and G + C% of individual genes was plotted as in Figure 1B. (B) Within each group of genes categorized by length, genes were grouped into three categories according to G + C content (45% ≤ G + C range 1 < 50%; 50% ≤ G + C range 2 < 55%, 55% ≤ G + C range 3 < 60%) and ESKOR values were compared. The ESKOR values (y-axis) from the four different length groups were plotted for the indicated G + C content range by violin plot. The white dot on the violin plot is the median. The black bar in the center of violin indicates the interquartile range, i.e. 25–75%.

**Figure 5.**
Association of G + C content with the effects of Supt4h reduction on gene expression. (A) Cell samples used for RNAPII-S2 ChIP-seq analysis were also used for RNA-seq assays. Gene expression changes in cells having Supt4h knockdown were sorted according to the extent of change by comparing their ratio Log₂ (RNA_FPKM_KD/RNA_FPKM_UNT) (orange dots, right-hand y-axis). The G + C percentage for each gene (blue dots, left-hand y-axis) was plotted as shown in Figure 1B for ChIP-seq analysis. The yellow dots are the moving average G + C% of 100 neighboring genes. (B) G + C content from the most up-regulated 1000 genes and most down-regulated 1000 genes were analyzed by violin plot. The white dot on the violin plot is the median. The black bar in the center of the violin indicates the interquartile range, i.e. 25–75%.

See this image and copyright information in PMC

References

1. Nudler E., Avetissova E., Markovtsov V., Goldfarb A.. Transcription processivity: protein-DNA interactions holding together the elongation complex. Science. 1996; 273:211–217. - PubMed
1. Wada T., Takagi T., Yamaguchi Y., Ferdous A., Imai T., Hirose S., Sugimoto S., Yano K., Hartzog G.A., Winston F.et al. .. DSIF, a novel transcription elongation factor that regulates RNA polymerase II processivity, is composed of human spt4 and spt5 homologs. Genes. Dev. 1998; 12:343–356. - PMC - PubMed
1. Yamaguchi Y., Wada T., Watanabe D., Takagi T., Hasegawa J., Handa H.. Structure and function of the human transcription elongation factor DSIF. J. Biol. Chem. 1999; 274:8085–8092. - PubMed
1. Guo M., Xu F., Yamada J., Egelhofer T., Gao Y., Hartzog G.A., Teng M., Niu L.. Core structure of the yeast spt4-spt5 complex: a conserved module for regulation of transcription elongation. Structure. 2008; 16:1649–1658. - PMC - PubMed
1. Ehara H., Yokoyama T., Shigematsu H., Yokoyama S., Shirouzu M., Sekine S.I.. Structure of the complete elongation complex of RNA polymerase II with basal factors. Science. 2017; 357:921–924. - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- Coriell Cell Repositories

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DSIF modulates RNA polymerase II occupancy according to template G + C content

Affiliations

DSIF modulates RNA polymerase II occupancy according to template G + C content

Authors

Affiliations

Abstract

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Abstract

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials