Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(9):e24051.
doi: 10.1371/journal.pone.0024051. Epub 2011 Sep 30.

A global clustering algorithm to identify long intergenic non-coding RNA--with applications in mouse macrophages

Affiliations

A global clustering algorithm to identify long intergenic non-coding RNA--with applications in mouse macrophages

Lana X Garmire et al. PLoS One. 2011.

Abstract

Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. genome-wide patterns of diffuse CHIP-Seq peaks and their application to lincRNA discovery.
A: the emerging patterns of Pol II CHIP-Seq peaks. Data are displayed as log10 transformation of the width of peaks vs. log10 transformation of the distance between two successive peak centers. Red data points denote peaks that appear to be linearly correlated between peak width and inter-peak distance. Blue data points denote peaks that lack the linear correlation between peak width and inter-peak distance. The linear separator that separates the two types of peaks is determined by iterative computation described in Methods. Insert: the density heat-map plot of data points in Figure 1. B: flowchart to demonstrate the process of filtering and clustering Pol II and H3K4me3 peaks to identify lincRNAs.
Figure 2
Figure 2. Comparison of GCLS and SICER on RefSeq genes.
A: comparison of the performance of SICER against GCLS among RefSeq genes. SICER is parameterized over 3 different gap distances: 600 bp, 1200 bp and 2200 bp. A region is defined as the maximum contig of several overlapping genes if there are any, or the locus of one single gene if there are no other overlapping genes. The total RefSeq regions and genes are plotted as the references to illustrate the fraction of regions and genes that are actively expressed in macrophages. B: comparison of the coverage of GCLS vs. SICER on RefSeq genes spanning from 10 kbp upstream of the transcription start site (TSS) to 10 kbp downstream of the transcription end sites (TES). The transcription regions (TSS-TES region) of genes of different length are normalized to the same effective length. This region was subdivided into 50 bins and the coverage was counted in each bin. Similarly, the 10 kbp region upstream of TSS and the 10 kbp region downstream of TES were also subdivided into 50 bins and counted the coverage in each bin. GCLS has the best coverage in the transcript region, and second lowest noise level in the upstream of TSSs and the downstream of TES.
Figure 3
Figure 3. Comparison of the lincRNA overlap among three different methods: GCLS, SICER600 and Guttman's method in .
The counts of lincRNAs are grouped in the ascending order of chromosomes.
Figure 4
Figure 4. Effect of LPS on lincRNAs.
A: differentially regulated lincRNAs by LPS. Due to the difference in tags between the LPS vs. no-treatment conditions, tag counts under the no-treatment condition are first normalized by a linear regression, and then tested for difference as described in Methods. Data plotted are the log10 transformation of the original tag counts in LPS treatment vs. the log10 transformation of the original tag counts in LPS treatment. Red data points (126) denote up-regulation and green data points (45) denote down-regulation. B: experimental validation of 11 lincRNAs. Data are plotted as log2 transformation of fold change on predicted exons by QPCR experiments vs. log2 transformation of fold change in Pol II tag counts by QPCR experiments. The lincRNAs that are under-detectable in QPCR are assigned to have y-values of 0.
Figure 5
Figure 5. Conservation in lincRNA.
A: conservation in the promoter region of lincRNA. The promoter region is defined as −3 K to 1 K relative to TSS that is labeled by the 3′ edge of the H3K4Me3 peaks. Averaged phastCon scores are used as measurements of conservation. Random intergenic sequences without evidence of lincRNAs are plotted as the control. Both LPS and no-treatment have significant higher phastCon scores than the random sequence (Wilcoxon tests, P-value<1e-15). B: conservation of predicted exons of lincRNA, in comparison to the introns and exons of protein coding genes. The accumulative fractions of phastCon scores are plotted against the phastCons. The predicted exons of lincRNAs are modestly conserved compared to the introns of protein coding genes, but are much less conserved compared to the exons of protein coding genes.
Figure 6
Figure 6. Association of lincRNAs with enhancer markers.
A: motifs enriched in the promoter regions of lincRNA that are defined as sequences within the H3K4Me3 clusters of the lincRNAs. B: Venn diagram overlap between lincRNA and enhancer regions that are labeled by CHIP-Seq signatures of PU.1 peaks and H3K4Me1 peaks.
Figure 7
Figure 7. UCSC genome browser (mm8) snapshots of lincRNA examples (left), as well as their conserved, thermodynamically stable 2nd structures predicted by RNAz (right).
A: lincRNA located 5′ distal side of Pla2g7 (phospholipase A2, group VII), whose Pol II tags are sensitively stimulated by LPS. B: lincRNA located 3′ distal side of Cxxc5 (CXXC-type zinc finger protein 5), whose Pol II tags are sensitively repressed by LPS.

Similar articles

Cited by

References

    1. Costa FF. Non-coding RNAs: new players in eukaryotic biology. Gene. 2005;357:83–94. - PubMed
    1. Costa FF. Non-coding RNAs: lost in translation? Gene. 2007;386:1–10. Epub 2006 Oct 2010. - PubMed
    1. Guttman M, Amit I, Garber M, French C, Lin MF, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458:223–227. Epub 2009 Feb 2001. - PMC - PubMed
    1. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A. 2009;106:11667–11672. - PMC - PubMed
    1. Tsai MC, Manor O, Wan Y, Mosammaparast N, Wang JK, et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science. 329:689–693. - PMC - PubMed

Publication types