Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 31:7:41669.
doi: 10.1038/srep41669.

The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

Affiliations

The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

Shiqi Tu et al. Sci Rep. .

Abstract

Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Analysis of the sequence features of human PRC2-binding lncRNAs.
(A) Workflow of the sequence composition analysis pipeline. (B) Calculation of transition frequency, which is defined as the frequency of observing a transition in the given sequence (here order-4 transition CATG→A is used as an example). (C) A building block of quad-tree comprised of 4 transitions with the same prefix. Each line represents a transition and the color indicates whether the transition is significantly favored or disfavored by human PRC2-positive lncRNAs. (D) The complete quad-tree of height 6 constituted by all possible transitions of order 0–5 (placed on level 1-6 accordingly). Particularly, the root is an empty string as the prefix of 4 order-0 transitions. (E) A branch cut from the quad-tree shown in (D), which starts from level 3 and contains two consecutively favored paths (CFPs) CGC→G→T→T and CGC→G→T→C. (F) Summary statistics of the CFPs observed in (D), which suggest the human PRC2-favored transitions significantly prefer to connect with each other and form CFPs.
Figure 2
Figure 2. Prediction of the PRC2-lncRNA interactions in human genome based on transition frequencies.
(A) ROC curves and corresponding AUC values of the prediction models built by the non-blind CV (red line) and the fully blind method (green line) in predicting human PRC2-binding lncRNAs. (B) A representative PRC2-positive lncRNA locus. Here its PRC2-favored and disfavored fragment are indicated by the red and blue bar, respectively, and the red tracks in the middle show the fRIP-seq read counts of EZH2 and SUZ12 in human K562 cell line. (C) Boxplot of the average PhastCons conservation scores of the PRC2-favored and disfavored fragments identified from human PRC2-binding lncRNAs. (D) Distribution of the fraction of the 500 bp fragments randomly selected from human PRC2-binding lncRNAs that overlap with the conserved elements. Here the distribution was draw from 105 times of random sampling and dash lines represent the fraction of PRC2-favored/disfavored fragments that overlap with the conserved elements.
Figure 3
Figure 3. Human PRC2-favored transitions in CFPs are more likely to be also favored by mouse PRC2-binding lncRNAs than the others.
(A) ROC curves and corresponding AUC values of different prediction models in predicting mouse PRC2-binding lncRNAs. The red and green curve correspond to mouse prediction models built by the non-blind CV and the fully blind method, respectively, in which mouse PRC2-positive and PRC2-negative lncRNAs were used for predictor selection and model training. The blue curve corresponds to the human prediction model using human PRC2-positive and PRC2-negative lncRNAs for predictor selection and model training. (B) Fractions of different groups of transitions that are identified as mouse PRC2-favored transitions. Here, the P-values were computed by right-tailed Fisher’s exact test based on hypergeometric distribution. (C) Boxplot of the AUC values of human PRC2-favored transitions in predicting mouse PRC2-binding lncRNAs. Here the human PRC2-favored transitions are divided into 2 groups based on whether or not they are located in CFPs, and the AUC value of a transition is calculated by directly using its frequency in each sequence as the prediction score of this sequence. (D) Fraction of mouse PRC2-positive and PRC2-negative lncRNAs that contain EZH2 RCS identified from PAR-CLIP-seq data. Here each group of lncRNAs were split into two subgroups of equal size by the median of their cross-species prediction scores derived from the prediction model trained with human lncRNAs, and the P-values were calculated by right-tailed Fisher’s exact test to measure whether the subgroup of lncRNAs with high prediction scores are significantly more likely to contain EZH2 RCS compared to the subgroup with low prediction scores. (E) ROC curve and corresponding AUC value of the human prediction model in predicting mouse RCS-containing lncRNAs from the RCS-null ones (green), and also that in predicting high-confidence mouse PRC2-positive lncRNAs from high-confidence mouse PRC2-negative ones (blue).
Figure 4
Figure 4. Compare the performance of prediction models based on K-mer and transition frequencies.
(A,B) AUC values of the prediction models based on transition (red bars) or K-mer (blue bars) frequencies, which were trained and tested by the human (A) and mouse (B) lncRNAs, respectively. Here the prediction models were built by the fully blind method, and all human/mouse PRC2-positive and PRC2-negative lncRNA were separately divided into two subgroups of equal size according to their length, termed as the moderately long and the extremely long subgroup, to access the performance of these models on lncRNAs of different length.

Similar articles

Cited by

References

    1. Margueron R. & Reinberg D. The Polycomb complex PRC2 and its mark in life. Nature 469, 343–349 (2011). - PMC - PubMed
    1. Varambally S. et al.. The polycomb group protein EZH2 is involved in progression of prostate cancer. Nature 419, 624–629 (2002). - PubMed
    1. Li G. et al.. Jarid2 and PRC2, partners in regulating gene expression. Genes Dev 24, 368–380 (2010). - PMC - PubMed
    1. Liu Y., Shao Z. & Yuan G. C. Prediction of Polycomb target genes in mouse embryonic stem cells. Genomics 96, 17–26 (2010). - PubMed
    1. Arnold P. et al.. Modeling of epigenome dynamics identifies transcription factors that mediate Polycomb targeting. Genome Res 23, 60–73 (2013). - PMC - PubMed

Publication types

Substances