Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2022 Jul 22;50(13):7260-7286.
doi: 10.1093/nar/gkac537.

Quality-controlled R-loop meta-analysis reveals the characteristics of R-loop consensus regions

Affiliations
Meta-Analysis

Quality-controlled R-loop meta-analysis reveals the characteristics of R-loop consensus regions

Henry E Miller et al. Nucleic Acids Res. .

Abstract

R-loops are three-stranded nucleic acid structures formed from the hybridization of RNA and DNA. While the pathological consequences of R-loops have been well-studied to date, the locations, classes, and dynamics of physiological R-loops remain poorly understood. R-loop mapping studies provide insight into R-loop dynamics, but their findings are challenging to generalize. This is due to the narrow biological scope of individual studies, the limitations of each mapping modality, and, in some cases, poor data quality. In this study, we reprocessed 810 R-loop mapping datasets from a wide array of biological conditions and mapping modalities. From this data resource, we developed an accurate R-loop data quality control method, and we reveal the extent of poor-quality data within previously published studies. We then identified a set of high-confidence R-loop mapping samples and used them to define consensus R-loop sites called 'R-loop regions' (RL regions). In the process, we identified a stark divergence between RL regions detected by S9.6 and dRNH-based mapping methods, particularly with respect to R-loop size, location, and colocalization with RNA binding factors. Taken together, this work provides a much-needed method to assess R-loop data quality and offers novel context regarding the differences between dRNH- and S9.6-based R-loop mapping approaches.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Reprocessing and standardization of R-loop mapping data reveals enrichment within R-loop forming sequences (RLFS). (A) The workflow used for reprocessed datasets. (B) A Genome Browser view of four representative samples. One ‘POS’-labelled (expected to map R-loops) and one ‘NEG’-labelled (not expected to map R-loops) sample each from DRIP and R-ChIP modes along with R-loop forming sequences (RLFS). NEG-labelled samples are indicated with red text (e.g. ‘RNH1’). The genome browser session for this visualization is also provided within this manuscript (see Availability). The SRA IDs of these samples are SRX1025894 (‘NT2’), SRX1025896 (‘NT2 (RNH1)’), SRX2683605 (‘HEK293’), SRX2675009 (‘HEK293 (WKKD)’; WKKD: mutant RNase H1 without catalytic or DNA binding capabilities). (C) Metaplots of the four samples in (B) showing the enrichment of their called peaks around RLFS. The Y axis is the Z score after min-max scaling.
Figure 2.
Figure 2.
QC model predicts incorrectly labelled (false negative/positive) R-loop mapping datasets based on enrichment within R-loop forming sequences. Metaplots display the consensus signal among samples that were labelled ‘POS’ (expected to map R-loops) or ‘NEG’ (not expected to map R-loops) and samples which were predicted by the quality model to be ‘POS’ or ‘NEG’. The plots show the LOESS regression line along with a confidence interval based on the standard error from LOESS regression. The annotations show the number and percentage of samples which are included.
Figure 3.
Figure 3.
Quality control model predictions are externally valid. (A) A Genome Browser image capture showing examples of DRIP-Sequencing datasets from two studies that exemplify all four possible quality model prediction and sample label (prediction:label) combinations (‘TCELL (Input)’ [SRX2455193] - NEG:NEG; ‘TCELL’ [SRX2455189] - NEG:POS; ‘786-O (RNH)’ [ERX3974965] - POS:NEG; ‘786-O’ [ERX3974964] - POS:POS). The genome browser session for this visualization is also provided within this manuscript (see Availability). (B) Violin/Box plots showing the distribution of Fisher's exact test odds ratios within all reprocessed R-loop mapping samples, split by prediction:label combination and by the genomic feature on which testing was performed. Significance was determined via the Kruskal-Wallis test followed by Dunn post-hoc with Bonferroni correction. ****P < 0.0001; ***P < 0.001; **P < 0.01; *P < 0.05; ns = p ≥ 0.05. (C) A heatmap showing the Pearson correlation (R) of reprocessed R-loop mapping samples around high-confidence R-loop sites. Dendrograms and row order show hierarchical clustering of samples and annotations show the prediction, label, and mode of each sample.
Figure 4.
Figure 4.
Consensus analysis of high-quality samples reveals differences in dRNH and S9.6-mapped R-loops. (A) A flow diagram showing the workflow used for consensus analysis of R-loop sites in catalytically dead RNase H1 (dRNH) mapping samples and S9.6-based (S9.6) mapping samples. (B) A flow diagram showing the algorithm for selecting high-confidence samples to include in the analysis. (C) A flow diagram showing the workflow for building consensus R-loop signal and calling consensus peaks. (D) A flow diagram showing the workflow for intersect analysis, which yields R-Loop Regions. (E) Donut charts showing the proportion of ‘POS’-labelled dRNH and S9.6 samples passing the quality filters. (F) A Venn diagram showing the overlap of consensus sites derived from dRNH and S9.6-based mapping samples. P value and odds ratio from Fisher's exact test. (G) Density plot showing the frequency of consensus peak width derived from S9.6 and dRNH-based mapping samples. X axis is log10 scaled. (H) Genome browser view of a top R-loop region (at the CALM2 gene) with dRNH and S9.6 consensus signal, peaks, and signal tracks/peaks from representative S9.6 and dRNH R-loop mapping samples. ‘R-loop regions’ were derived from union of dRNH/S9.6 consensus peaks and are coloured based on the ‘confidence score’ of that consensus (see Materials and Methods). Arrow indicates direction of transcription. ENCODE cCREs are tissue agnostic cis regulatory elements predicted by ENCODE-SCREEN, provided by the UCSC genome browser (colour code: red is ‘promoter-like signature’; orange is ‘proximal enhancer-like signature’; yellow is ‘distal enhancer-like signature’). The genome browser session for this visualization is also provided within this manuscript (see Availability). (I) Metaplot showing the R-loop consensus signal (scaled by total number of samples in dRNH and S9.6 respectively) around genes. (J) Stacked bar chart showing the proportion of total RL Region genomic coverage (in megabases, ‘Mb’) occupied by S9.6 and dRNH sites. S9.6 and dRNH consensus sites are split into ‘only’ (unique to either S9.6 or dRNH) and ‘shared’ (detected by both S9.6 and dRNH). (K) Annotation plot showing the proportion of summitted (500 bp width) dRNH and S9.6 consensus sites within various genomic features.
Figure 5.
Figure 5.
Differential abundance analysis of dRNH- and S9.6-detectable R-loop regions reveals association between dRNH-specific R-loops and transcriptional pausing. (A) Venn diagram highlighting the group of R-loop Regions (RL Regions) analysed in this figure. (B) A PCA plot showing the divergence in abundance between dRNH and S9.6-based approaches within the shared RL regions. The percent of variance explained by each PC is indicated on each axis. (C) Volcano plot showing the top differentially abundant RL regions between dRNH and S9.6-based samples. Vertical dashed line represents absolute log2 fold change of 1, horizontal dashed line represents P adjusted value of P < 1e−30. Red: RL-regions which meet both the log2 fold change and P adjusted value cut-off. Green: RL-regions which meet the log2 fold change cut-off alone. (D) A PCA feature plot showing a top differentially abundant RL region (‘RL46345’). (E) Genome browser image of RL46345 highlighting difference in S9.6 and dRNH signal at that site. The genome browser session for this visualization is also provided within this manuscript (see Availability). (F) Top dRNH and S9.6-specific hits from GO term analysis of the genes associated with the top differentially abundant RL regions. Enrichment is measured by ‘Combined Score’ such that higher indicates more strongly enriched. (G) Graphical illustration which depicts the Class I/II hypothesis. Class I R-loops are hypothesized to occur as a result of promoter pausing and are efficiently mapped by dRNH and, to a lesser extent, S9.6. Class II R-loops are associated with transcriptional elongation and are mapped efficiently by S9.6 alone. (H) Representative distribution plots showing the pause index of genes overlapping dRNH- and S9.6-specific R-loop regions. Data were derived from precision run on (PRO) sequencing in three cell lines (HEK293, HeLa, K562) for which both dRNH and S9.6 R-loop mapping data were available. (I) Representative genome browser image depicting a top dRNH-specific RL region (RL12840) which occurs at a bidirectional promoter for two of the most transcriptionally paused genes, ATG3 and SLC35A5. PRO-Seq in HEK293, HeLa, and K562 is displayed alongside dRNH and S9.6 consensus tracks. P values generated via Wilcoxon rank sum test (minimum displayable P value is 2.22e−16).
Figure 6.
Figure 6.
dRNH uniquely maps shorter R-loops that are enriched within distal enhancer regions. (A) Venn diagram highlighting the group of R-loop Regions analysed in this figure. (B) Stacked bar chart showing the distribution of transcript (tx) features within dRNH-only (dRNH consensus peaks not mapped by S9.6), dRNH-shared (dRNH consensus peaks which overlap with S9.6 consensus peaks), and S9.6 consensus peaks. (C) Distribution of peak widths (log scaled) across groups. P value from Wilcoxon rank sum test (‘2.22e-16’ is the minimum displayable value). (D) Heatmap depicting the enrichment of peaks within ENCODE-SCREEN predicted cis-regulatory elements (CREs). Values are the log2-transformed odds ratio from Fisher's exact test. (E) Line-plot showing pileup of consensus peaks around distal enhancers (GeneHancer database). (F) Venn diagram showing the overlap between intergenic dRNH-only peaks and distal enhancers.
Figure 7.
Figure 7.
Intergenic dRNH-only R-loops occur in tissue-specific distal enhancers and co-localize with CTCF/Cohesin. (A) A diagram showing the workflow for tissue-specific analysis of dRNH-mapped R-loops at distal enhancers. For both iPSCs and CUTLL1 cells, tissue-specific MapR (dRNH mapping technique) and global run-on (GRO) sequencing datasets were combined with the GeneHancer enhancer database to identify distal enhancers with putative enhancer RNA (eRNA) R-loops. These enhancers were characterized using the GeneHancer interaction database and the CellMarker annotation database. Finally, they were combined with HiC-sequencing data (iPSC-only) and visualized on the genome browser. (B) Line plot showing the enrichment of iPSC MapR peaks (4 replicates) within distal enhancers. (C) For iPSC datasets, box plot showing the eRNA expression level within distal enhancers that are bound by dRNH, compared to all distal enhancers. P value derived from Wilcoxon rank sum test (one-sided) (minimum displayable p value is 2.22e-16). (D) For iPSC datasets, Bar chart depicting the enrichment of gene targets of dRNH-bound distal enhancers. Enrichment uses the CellMarker database. (E) Same as (C) but for CUTTL1 datasets. (F) Same as (D) but for CUTTL1 datasets. (G) Representative genome browser image showing iPSC MapR and GRO-Seq signal (mean of replicates) at multiple distal enhancers (GH: GeneHancer database) around TAD boundary. Consensus dRNH-only peaks are highlighted in the ‘dRNH Consensus Peaks’ track. The genome browser session for this visualization is also provided within this manuscript (see Availability). (H) Same as (B) but for iPSC CTCF peaks. (I) Same as (B) but for enrichment of R-loop consensus peaks around CTCF consensus peaks. (J) Bar chart displaying the enrichment of R-loop consensus peaks within CTCF and cohesin complex members (SA1, SA2, SMC3, RAD21). P adjusted value (Padj) and Odds Ratio from Fisher's exact test (maximum –log10P adjusted value: 307.7).
Figure 8.
Figure 8.
R-loops are enriched at bivalent, transcriptionally paused enhancer clusters in iPSCs. (A) A stacked bar chart showing the proportion of intergenic enhancers belonging to each chromHMM state. Enhancers that overlap with MapR peaks are labelled ‘dRNH-accessible’. (B) Line plot showing the enrichment of iPSC MapR peaks (4 replicates) within bivalent distal enhancers. (C) Bar chart depicting the enrichment of gene targets of dRNH-bound bivalent enhancers. Enrichment uses the ARCHS4 transcription factor co-expression database. (D) Representative genome browser image displaying the HOXD bivalent enhancer cluster. The genome browser session for this visualization is also provided within this manuscript (see Availability). (E) Tornado plots showing the pileup of signal for each track around bivalent and non-bivalent enhancer clusters. Max value in this plot is 2.2 and the minimum is 0.2.
Figure 9.
Figure 9.
R-loop region conservation analysis. (A) Conservation rank plot showing all RL regions ordered by the percent of high-confidence samples in which they are found and binned by percentile ranges. (B) Annotation plot showing the gene features which overlap RL regions from various conservation percentile ranges. (C) Pathway enrichment plot showing the significance (via P adjusted value) and effect size (via Combined Score) of the ‘ChEA 2016’ transcription factor pathway enrichment from the genes overlapping RL regions within each percentile.
Figure 10.
Figure 10.
Model of differences in dRNH- and S9.6-mapped R-loops. R-loops arise as a result of transcriptional pausing at bivalent/poised enhancer regions (eRNA R-loops). They also result from promoter-proximal pausing (Class I R-loops) and transcriptional elongation (Class II R-loops). Catalytically dead RNase H1 (dRNH) efficiently maps shorter R-loops in enhancer and promoter regions, but not within gene bodies. This is ostensibly due to occupancy of Class II R-loops with RNA binding proteins (RBPs) and/or a preference of dRNH for shorter R-loops. S9.6-based mapping methods involve a protease treatment which removes RBPs, and this may explain why they map larger R-loops across the gene body. These differences are reflected in the proportion of the R-loop-forming genome (139 mega bases, ‘Mb’) which dRNH occupies (36.4 Mb) compared to S9.6 (121.8 Mb).

References

    1. Ginno P.A., Lim Y.W., Lott P.L., Korf I., Chédin F.. GC skew at the 5′ and 3′ ends of human genes links R-loop formation to epigenetic regulation and transcription termination. Genome Res. 2013; 23:1590–1600. - PMC - PubMed
    1. Ginno P.A., Lott P.L., Christensen H.C., Korf I., Chédin F.. R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol. Cell. 2012; 45:814–825. - PMC - PubMed
    1. Niehrs C., Luke B.. Regulatory R-loops as facilitators of gene expression and genome stability. Nat. Rev. Mol. Cell Biol. 2020; 21:167–178. - PMC - PubMed
    1. Hamperl S., Bocek M.J., Saldivar J.C., Swigut T., Cimprich K.A.. Transcription-replication conflict orientation modulates R-loop levels and activates distinct DNA damage responses. Cell. 2017; 170:774–786. - PMC - PubMed
    1. Gan W., Guan Z., Liu J., Gui T., Shen K., Manley J.L., Li X.. R-loop-mediated genomic instability is caused by impairment of replication fork progression. Genes Dev. 2011; 25:2041–2056. - PMC - PubMed

Publication types