Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 28;22(1):397.
doi: 10.1186/s12864-021-07727-7.

LncRNA:DNA triplex-forming sites are positioned at specific areas of genome organization and are predictors for Topologically Associated Domains

Affiliations

LncRNA:DNA triplex-forming sites are positioned at specific areas of genome organization and are predictors for Topologically Associated Domains

Benjamin Soibam et al. BMC Genomics. .

Abstract

Background: Chromosomes are organized into units called topologically associated domains (TADs). TADs dictate regulatory landscapes and other DNA-dependent processes. Even though various factors that contribute to the specification of TADs have been proposed, the mechanism is not fully understood. Understanding the process for specification and maintenance of these units is essential in dissecting cellular processes and disease mechanisms.

Results: In this study, we report a genome-wide study that considers the idea of long noncoding RNAs (lncRNAs) mediating chromatin organization using lncRNA:DNA triplex-forming sites (TFSs). By analyzing the TFSs of expressed lncRNAs in multiple cell lines, we find that they are enriched in TADs, their boundaries, and loop anchors. However, they are evenly distributed across different regions of a TAD showing no preference for any specific portions within TADs. No relationship is observed between the locations of these TFSs and CTCF binding sites. However, TFSs are located not just in promoter regions but also in intronic, intergenic, and 3'UTR regions. We also show these triplex-forming sites can be used as predictors in machine learning models to discriminate TADs from other genomic regions. Finally, we compile a list of important "TAD-lncRNAs" which are top predictors for TADs identification.

Conclusions: Our observations advocate the idea that lncRNA:DNA TFSs are positioned at specific areas of the genome organization and are important predictors for TADs. LncRNA:DNA triplex formation most likely is a general mechanism of action exhibited by some lncRNAs, not just for direct gene regulation but also to mediate 3D chromatin organization.

Keywords: CTCF; Long noncoding RNAs; RNA:DNA triplex; TAD-lncRNAs; TADs; Triplex structures.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
LncRNAs expression patterns and their triplex-forming sites. (A) Heatmap showing the clustering results of lncRNAs based on their expression across seven cell lines. Nine clusters are annotated next to the heatmap with Roman numerals. Gene count in each cluster is indicated in parentheses. The fraction of lncRNAs w.r.t triplex-forming sites (TFSs) count, triplex-forming domain (TFD) count, and triplex-forming domain length are shown in panels (B), (C), and (D), respectively. Violin plots of TFSs count, TFD count, and TFD length for lncRNAs belonging to different clusters identified in panel (A) are shown in panels (E), (F), and (G), respectively
Fig. 2
Fig. 2
Triplex-forming sites (TFSs) are enriched in TADs, boundaries, and anchors but evenly distributed across TADs. (A) Illustration describing the procedure to perform a statistical test to check for the enrichment of TFSs in domains (or boundaries or loop anchors). The observed coverage of TFSs in all the domains (or boundaries or anchors) is the sum of all the base pairs in the domains (or boundaries or anchors) that overlap with the TFSs. Expected coverage is generated by randomly permuting the TFSs within the genome and computing the coverage of this random set with the domains (or boundaries or anchors). This random shuffling is performed 1000 times, for each shuffled set; an expected coverage is obtained to generate a distribution of expected coverage. These distributions are checked for normality using the Anderson-Darling normality test. Distribution of expected coverage (blue) versus the observed coverage (vertical red line) of TFSs in domains, boundaries, and anchors are shown in panels (B), (C), and (D), respectively for the HeLa cell line. (E) Frequencies of observed TFSs are evenly distributed across TADs and not significantly different from expected frequencies (p-value > 0.1 using Kolmogorov-Smirnov test). The graph is for the HeLa cell line
Fig. 3
Fig. 3
Relationships of Triplex forming sites with domain size, CTCF sites, and genomic annotation. (A) A small negative correlation between the size of domains (x-axis) and the normalized overlap between TFSs and TADs (y-axis). The Pearson correlation coefficients are indicated for each cell line. (B) Distances between closest pairs of CTCF sites and TFSs are not significantly different from random and TFSs (Chi-Square test, p-value > 0.1 for each cell line). The plots are histogram plots of the distances with four bins. (C) Genomic annotation (x-axis) of lncRNA:DNA TFSs reveal major fraction (y-axis) of them are in promoter, intronic, intergenic, and 3’UTR regions.
Fig. 4
Fig. 4
LncRNA:DNA triplex-forming sites as predictors for TADs. (A) Triplex-forming sites (TFSs) in n TADs and in the background set consisting of n randomly selected genomic regions, which do not overlap with TADs. (B) The frequency of TFSs for lncRNAs is used as features in a prediction problem, where TADs and the random regions have class labels “1” and “0”, respectively. (C) The predictive models are trained on the training set (80 % of 2n) to determine the appropriate model parameters. The model performances are computed on the test data (20 % of 2n). (D) Prediction accuracies and four other metrics of the predictive models. The values are averaged across the six cell lines (E) TAD-lncRNA DANCR with its triplex-forming domain (TFD) located from base pair position 679 to 702. (F) Genomic annotation of locations of the TFSs of DANCR in GM12878 cell line. (G) Top gene ontology terms associated with the genes nearest to the TFSs of TAD-lncRNA DANCR in the GM12878 cell line. X-axis indicates -log10 p-value

Similar articles

Cited by

References

    1. Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell. 2015;162:687–8. doi: 10.1016/j.cell.2015.07.024. - DOI - PMC - PubMed
    1. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–80. doi: 10.1038/nature11082. - DOI - PMC - PubMed
    1. Lupiáñez DG, Spielmann M, Mundlos S. Breaking TADs: How Alterations of Chromatin Domains Result in Disease. Trends in Genetics. 2016;32:225–37. doi: 10.1016/j.tig.2016.01.003. - DOI - PubMed
    1. Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015;161:1012–25. doi: 10.1016/j.cell.2015.04.004. - DOI - PMC - PubMed
    1. Valentijn LJ, Koster J, Zwijnenburg DA, Hasselt NE, Van Sluis P, Volckmann R, et al. TERT rearrangements are frequent in neuroblastoma and identify aggressive tumors. Nat Genet. 2015;47:1411–4. doi: 10.1038/ng.3438. - DOI - PubMed

LinkOut - more resources