Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 16:8:46.
doi: 10.1186/s13072-015-0040-6. eCollection 2015.

RNA:DNA hybrids in the human genome have distinctive nucleotide characteristics, chromatin composition, and transcriptional relationships

Affiliations

RNA:DNA hybrids in the human genome have distinctive nucleotide characteristics, chromatin composition, and transcriptional relationships

Julie Nadel et al. Epigenetics Chromatin. .

Abstract

Background: RNA:DNA hybrids represent a non-canonical nucleic acid structure that has been associated with a range of human diseases and potential transcriptional regulatory functions. Mapping of RNA:DNA hybrids in human cells reveals them to have a number of characteristics that give insights into their functions.

Results: We find RNA:DNA hybrids to occupy millions of base pairs in the human genome. A directional sequencing approach shows the RNA component of the RNA:DNA hybrid to be purine-rich, indicating a thermodynamic contribution to their in vivo stability. The RNA:DNA hybrids are enriched at loci with decreased DNA methylation and increased DNase hypersensitivity, and within larger domains with characteristics of heterochromatin formation, indicating potential transcriptional regulatory properties. Mass spectrometry studies of chromatin at RNA:DNA hybrids shows the presence of the ILF2 and ILF3 transcription factors, supporting a model of certain transcription factors binding preferentially to the RNA:DNA conformation.

Conclusions: Overall, there is little to indicate a dependence for RNA:DNA hybrids forming co-transcriptionally, with results from the ribosomal DNA repeat unit instead supporting the intriguing model of RNA generating these structures in trans. The results of the study indicate heterogeneous functions of these genomic elements and new insights into their formation and stability in vivo.

Keywords: Chromatin; DNA methylation; Mass spectrometry; Non-coding RNA; R-loop; RNA:DNA hybrid; Transcription; Transcription factor.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Subcellular localization studies. In panel a we show the results of hybridization of the fluorescently-labeled RDIP-seq library to a control male metaphase preparation. The RDIP-seq library is shown in red, a bacterial artificial chromosome (BAC) probe mapping to chromosome 9 in green, and DNA counterstained by DAPI in blue. We observe a specific strong signal from the RDIP-seq library mapping to the p arms of acrocentric chromosomes (HSA13-15 and HSA21-22), indicating enrichment at the nucleolar organizing regions (NORs) encoding ribosomal RNAs, and at the pericentromeric region of chromosome 9. In panel b we show the results of immunofluorescence using the S9.6 antibody (green) with an antibody to fibrillarin (red), demonstrating co-localization with the intranuclear S9.6 antibody signal (merge) and therefore enrichment in nucleoli. Further signal from the nuclear periphery and the cytoplasm using S9.6 is also observed, which may represent detection by this antibody of RNA conformations rather than RNA:DNA hybrids specifically [48]
Fig. 2
Fig. 2
Mapping of RNA:DNA hybrids within the ribosomal DNA repeat unit. The upper panel shows the results of RDIP-seq (gray) and RNA-seq (red), with genomic annotations and results of ChIP-seq analysis in K562 cells [55] plotted below. RDIP-seq and RNA-seq data are both represented using a smoothed plot showing the number of reads aligned to each basepair of the repeating unit, while the ChIP-seq data signal intensity represents the mean value of non-overlapping 50 bp windows. RDIP-seq values were normalized by subtracting the frequencies of aligned reads of the input sample in each window. We find that RNA:DNA hybrids co-localize with the rRNA transcripts, but that there are also RDIP-seq peaks of comparable magnitude in the intergenic spacer (IGS) where no transcriptional activity is apparent from RNA-seq. The RNA:DNA hybrids in the IGS are upstream of the promoter region and flank the upstream candidate cis-regulatory sequence where there is H3K4 methylation and acetylation of H3K9 and H3K27
Fig. 3
Fig. 3
Genomic distribution of RNA:DNA hybrids. In panel a we show that the proportion of reads mapping to rDNA is 2 %, and break down the remaining 98 % by genomic context, showing the majority of RNA:DNA hybrids (called as peaks using ChIP-seq analytical approaches) to be located in intergenic regions. To understand these RNA:DNA hybrid distributions, we calculated observed/expected ratios based on nucleotide occupancy of genomic features, and performed permutation analyses testing for the likelihood of randomized intersection (b), the results of which are shown in Additional file 2: Table S1. We found depletion of RNA:DNA hybrids at RefSeq gene bodies, intergenic regions, and SINE and DNA transposable elements but significant enrichment at promoters and CpG islands, and a number of purine-rich repetitive sequences
Fig. 4
Fig. 4
Nucleotide skewing analyses. In panel a we plot the skewing within a strand of A compared to T (x axis) or G compared to C (y axis) in the RNA:DNA hybrid peaks genome-wide. We find that the peaks are strongly over-represented for purine (G+A) and pyrimidine (C+T) skewing. As our sequencing approach allowed us to identify the RNA and DNA-derived strands separately in the RNA:DNA hybrid, in b we proceeded to test whether there was a relationship between skewing (based on the number of G+A divided by the total number of nucleotides) and each type of nucleic acid-derived sequence, finding a clear enrichment for purine skewing on the RNA-derived strand
Fig. 5
Fig. 5
Transcriptional relationships of RNA:DNA hybrids. In a the proportion of RNA:DNA hybrid peaks in transcribed genes is shown to be higher than in non-transcribed genes, but that the majority of genes do not contain RNA:DNA hybrids. In b a metaplot of RNA:DNA hybrid peaks is shown, illustrating the number of peaks intersecting with 100 bp windows, with the RNA of the hybrid on the transcribed strand of the gene (red) or the opposite strand (blue). This revealed an enrichment of the RNA-derived sequence on the transcribed strand in the first ~1.5 kb downstream from the transcription start site (TSS). A depletion of RNA:DNA hybrids is found at the transcription end site (TES). In c we show that the region immediately downstream from the TSS is purine-skewed, represented by skewing values of 100 bp windows averaged for all genes, but that this is to the same degree in genes that form RNA:DNA hybrids (blue) as those genes that do not form these structures (red). In d a metaplot of RefSeq genes (left) shows that the transcription level of genes (as measured by RNA-seq) is positively associated with the number of RN:DNA hybrids intersecting with 100 bp windows immediately downstream of the TSS. This reflects only modest increases in the small proportions of genes forming peaks (right), though found to be a significant relationship using a proportions test
Fig. 6
Fig. 6
Macro-scale genomic associations of RNA:DNA hybrids. We used a least absolute shrinkage and selection operator (LASSO) adaptive regression approach to explore the association of genomic sequence features with RNA:DNA hybrid density in 500 kb windows. The figure shows the order in which covariates enter the model as the constraint on the sum of the regression coefficients (x axis) is progressively relaxed from 0 to its maximum value (corresponding to the ordinary least squares regression vector)
Fig. 7
Fig. 7
Chromatin organizational studies at RNA:DNA hybrids using mass spectrometry. In panel a we show the experimental approach used for these proteomic studies. In b the altered pattern of enriched proteins compared with the input sample is seen using gel electrophoresis, and the results of Western blots confirming the enrichment of specific candidate proteins identified by mass spectrometry (ILF2, ILF3, hnRNP C1/C2), with SP1 and SP3 as controls known to bind to G-skewed DNA motifs

Similar articles

Cited by

References

    1. Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y, et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012;22:1798–1812. doi: 10.1101/gr.139105.112. - DOI - PMC - PubMed
    1. Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R, Johnson AK, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012;489:83–90. doi: 10.1038/nature11212. - DOI - PMC - PubMed
    1. Natarajan A, Yardimci GG, Sheffield NC, Crawford GE, Ohler U. Predicting cell-type-specific gene expression from regions of open chromatin. Genome Res. 2012;22:1711–1722. doi: 10.1101/gr.135129.111. - DOI - PMC - PubMed
    1. Yip KY, Cheng C, Bhardwaj N, Brown JB, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M, Gerstein M. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 2012;13:R48. doi: 10.1186/gb-2012-13-9-r48. - DOI - PMC - PubMed
    1. Hu S, Wan J, Su Y, Song Q, Zeng Y, Nguyen HN, Shin J, Cox E, Rho HS, Woodard C, et al. DNA methylation presents distinct binding sites for human transcription factors. eLife. 2013;2:e00726. - PMC - PubMed