Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 8;47(6):e32.
doi: 10.1093/nar/gkz037.

Detection of RNA-DNA binding sites in long noncoding RNAs

Affiliations

Detection of RNA-DNA binding sites in long noncoding RNAs

Chao-Chung Kuo et al. Nucleic Acids Res. .

Abstract

Long non-coding RNAs (lncRNAs) can act as scaffolds that promote the interaction of proteins, RNA, and DNA. There is increasing evidence of sequence-specific interactions of lncRNAs with DNA via triple-helix (triplex) formation. This process allows lncRNAs to recruit protein complexes to specific genomic regions and regulate gene expression. Here we propose a computational method called Triplex Domain Finder (TDF) to detect triplexes and characterize DNA-binding domains and DNA targets statistically. Case studies showed that this approach can detect the known domains of lncRNAs Fendrr, HOTAIR and MEG3. Moreover, we validated a novel DNA-binding domain in MEG3 by a genome-wide sequencing method. We used TDF to perform a systematic analysis of the triplex-forming potential of lncRNAs relevant to human cardiac differentiation. We demonstrated that the lncRNA with the highest triplex-forming potential, GATA6-AS, forms triple helices in the promoter of genes relevant to cardiac development. Moreover, down-regulation of GATA6-AS impairs GATA6 expression and cardiac development. These data indicate the unique ability of our computational tool to identify novel triplex-forming lncRNAs and their target genes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The computational framework of TRIPLEXES and TDF. (A) Triplexes are formed by binding of single-stranded RNA (blue) with a purine-rich strand (green) of a double-stranded DNA via Hoogsteen base pairing. To form a triplex in the parallel orientation, a pyrimidine or mixed motifs are required, but the anti-parallel orientation requires a purine or mixed motifs. (B) For a given RNA and DNA sequence, TRIPLEXES identifies candidate triple helices with a minimum size and maximum number of mismatches following one of the canonical codes. Each triplex is formed by one RNA sequence (triplex forming oligo – TFO) and a DNA region (triple target sites – TTS). We introduce here the concept of DNA binding domains (DBD) based on the fact that TFOs (orange) usually group in particular regions of a RNA. Contiguous regions with overlapping TFOs (marked in red) define a DNA-binding domain. (C) TDF performs statistical tests by combing predictions from TRIPLEXES to answer the following questions: (1) which regions of a RNA (DBD) are more likely to form triple helices with particular DNA target regions? (2) Which DNA regions (target genes) are more likely to be targeted by the RNA? and (3) which lncRNAs are more likely to form triple helices in a set of target DNA regions?
Figure 2.
Figure 2.
TDF detects known and novel DNA binding domains of Fendrr and MEG3. (AB) The coverage of TFOs (y-axis) within Fendrr and MEG3 sequences (x-axis). Regions highlighted in red/grey indicate significant DBDs. (C) DBD-Capture-Seq signals and peaks for Domain I (blue), Domain II (green), and control (red) as well as ChOP-Seq peaks in the validated triplex-forming regions (orange). (D) A Venn diagram showing the overlap between DBD-Capture-Seq (MEG3 Domains I and II) and MEG3 ChOP-Seq. De novo motifs detected in the top 500 DBD-Capture-Seq regions are also presented. (E) TDF analysis reveals a high propensity (higher z-score) of Domain I RNA to form triple helices in Domain I DBD-Capture-Seq peaks in comparison with Domain II RNA sequence and vice versa. (FG) DBD logos indicating the nucleotides from the MEG3 domain sequence, which are predicted to form triple helices in Capture-Seq peaks of the respective domain. Higher nucleotides indicates higher number of triple helices (TTSs).
Figure 3.
Figure 3.
Characterisation of triple helices forming lncRNAs during cardiac differentiation. (A) The strategy for identification of lncRNAs forming triple helices during cardiac differentiation. (B) Distribution of statistics used to rank lncRNAs according to their triplex-forming potential. (C) The expression profile of GATA6-AS. (D) TDF showing the presence of two domains in GATA6-AS, which were predicted to form a triplex in promoters of the up-regulated genes. (E) A de novo identified G-rich motif in 332 Domain I DBD-Capture-Seq peaks (out of 500 top-ranked peaks). (F) DBD logo indicating the nucleotides from the GATA6-AS domain sequence, which are predicted to form triple helices in GATA6-AS Capture-Seq peaks. (G) TDF analysis showing high propensity (higher z-score) of Domain I RNA to form triple helices in corresponding Capture-Seq peaks but not in control peaks. (H) Area under the precision recall curve (blue) associating the overlap of GATA6-AS-Domain I-Capture-Seq peaks with the promoters of genes (±1 kb) as ranked by TDF.
Figure 4.
Figure 4.
Functional characterization of GATA6-AS targets: (A) GATA6-AS DBD-Capture-Seq peaks are localized within the promoter of genes predicted to be targets of GATA6-AS by TDF. As examples of non-target regions, promoters of genes not up-regulated during cardiac differentiation are shown. (B) ChOP-PCR showing the in vivo association of GATA6-AS with the target genes identified by GATA6-AS DBD-Capture-Seq. (C) Quantitative reverse-transcription PCR (RT-qPCR) analysis of GATA6-AS and the predicted targets (GATA6 and MEIS1) after an ASO-based knockdown of GATA6-AS. (D) RT-qPCR analysis of mesodermal and cardiac mesodermal genes after the ASO-based knockdown of GATA6-AS. Error bars represent standard deviations for n = 3. The P-values were generated by a two-tailed t-test.
Figure 5.
Figure 5.
Benchmarking of TDF and TRIPLEXATOR. (A,B) Precision recall curves based on ranking of DNA target regions from TDF and TRIPLEXATOR for MEG3 and GATA6-AS. TDF has an area under the curve of 0.54 for GATA6-AS and 0.04 for MEG3, while TRIPLEXATOR has AUPR of 0.46 for GATA6-AS and 0.03 for MEG3. Background precision corresponds to the proportion of possible DNA target regions (promoters for GATA6-AS and whole genome for MEG3), which overlap with a DBD-Capture-Seq peak of the corresponding lncRNA.

Similar articles

Cited by

References

    1. Guttman M., Rinn J.L.. Modular regulatory principles of large non-coding RNAs. Nature. 2012; 482:339–346. - PMC - PubMed
    1. Johnson R., Guigó R.. The RIDL hypothesis: transposable elements as functional domains of long noncoding RNAs. RNA. 2014; 20:959–976. - PMC - PubMed
    1. Chu C., Qu K., Zhong F., Artandi S., Chang H., Zhong L.F., Artandi E.S., Chang Y.H.. Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol. Cell. 2011; 44:667–678. - PMC - PubMed
    1. West J.a., Davis C.P., Sunwoo H., Simon D.M., Sadreyev R.I., Wang P.I., Tolstorukov M.Y., Kingston R.E.. The long noncoding RNAs NEAT1 and MALAT1 bind active chromatin sites. Mol. Cell. 2014; 55:791–802. - PMC - PubMed
    1. Mondal T., Subhash S., Vaid R., Enroth S., Uday S., Reinius B., Mitra S., Mohammed A., James A.R., Hoberg E. et al. .. MEG3 long noncoding RNA regulates the TGF-β pathway genes through formation of RNA-DNA triplex structures. Nat. Commun. 2015; 6:7743. - PMC - PubMed

Publication types