Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 11:17:220.
doi: 10.1186/s12864-016-2457-0.

Integrating RNA-seq and ChIP-seq data to characterize long non-coding RNAs in Drosophila melanogaster

Affiliations

Integrating RNA-seq and ChIP-seq data to characterize long non-coding RNAs in Drosophila melanogaster

Mei-Ju May Chen et al. BMC Genomics. .

Abstract

Background: Recent advances in sequencing technology have opened a new era in RNA studies. Novel types of RNAs such as long non-coding RNAs (lncRNAs) have been discovered by transcriptomic sequencing and some lncRNAs have been found to play essential roles in biological processes. However, only limited information is available for lncRNAs in Drosophila melanogaster, an important model organism. Therefore, the characterization of lncRNAs and identification of new lncRNAs in D. melanogaster is an important area of research. Moreover, there is an increasing interest in the use of ChIP-seq data (H3K4me3, H3K36me3 and Pol II) to detect signatures of active transcription for reported lncRNAs.

Results: We have developed a computational approach to identify new lncRNAs from two tissue-specific RNA-seq datasets using the poly(A)-enriched and the ribo-zero method, respectively. In our results, we identified 462 novel lncRNA transcripts, which we combined with 4137 previously published lncRNA transcripts into a curated dataset. We then utilized 61 RNA-seq and 32 ChIP-seq datasets to improve the annotation of the curated lncRNAs with regards to transcriptional direction, exon regions, classification, expression in the brain, possession of a poly(A) tail, and presence of conventional chromatin signatures. Furthermore, we used 30 time-course RNA-seq datasets and 32 ChIP-seq datasets to investigate whether the lncRNAs reported by RNA-seq have active transcription signatures. The results showed that more than half of the reported lncRNAs did not have chromatin signatures related to active transcription. To clarify this issue, we conducted RT-qPCR experiments and found that ~95.24% of the selected lncRNAs were truly transcribed, regardless of whether they were associated with active chromatin signatures or not.

Conclusions: In this study, we discovered a large number of novel lncRNAs, which suggests that many remain to be identified in D. melanogaster. For the lncRNAs that are known, we improved their characterization by integrating a large number of sequencing datasets (93 sets in total) from multiple sources (lncRNAs, RNA-seq and ChIP-seq). The RT-qPCR experiments demonstrated that RNA-seq is a reliable platform to discover lncRNAs. This set of curated lncRNAs with improved annotations can serve as an important resource for investigating the function of lncRNAs in D. melanogaster.

Keywords: Active transcription; ChIP-seq; Drosophila melanogaster; Long non-coding RNA; RNA-seq.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
RT-qPCR experiments for a selected set of lncRNAs in brains. a 22 novel lncRNAs discovered in the present study were selected for validation. RpL32 (a coding gene) and roX1 (a non-coding gene) were included as positive controls. The horizontal line indicated − delta Ct ≥ 1. The rectangle indicated the five lncRNAs with considerably low expression, and was tested again by the second RT-qPCR experiment shown in (b). b The five lncRNAs from the rectangle of (a) were tested again by RT-qPCR with twofold amount of template cDNA. Ten FlyBase lncRNAs were included for comparison. The three FlyBase lncRNAs highlighted by the orange stars were selected because their RPKM values in our brain RNA-seq data was 0
Fig. 2
Fig. 2
Expression profiles at different developmental stages of fruit fly. a Averaged RPKM values at different developmental stages for lncRNAs and mRNAs. b Numbers of expressed transcripts (RPKM > 1) at different developmental stages for lncRNAs and mRNAs, respectively
Fig. 3
Fig. 3
Analysis of chromatin signatures (Pol II, H3K36me3 and H3K4me3) in the curated lncRNA genes
Fig. 4
Fig. 4
RT-qPCR experiments of a selected set of lncRNAs in male adults. G1: high expression with chromatin signatures (11 lncRNAs); G2: low expression with chromatin signatures (11 lncRNAs); G3: high expression without chromatin signatures (10 lncRNAs); and G4: low expression without chromatin signatures (10 lncRNAs). Three negative controls (un-transcribed region 1, 2, and 3) were all around zero. Stars were used to highlight the lncRNAs that were not from the databases (Orange stars: the selected lncRNAs from Young et al. [18]. Blue stars: the lncRNAs from the present study). The horizontal line indicated the cutoff (−delta Ct ≥2) used to define a validated lncRNA. Green stars: the transcripts that are now annotated as other types of transcripts by FlyBase, and thus were removed from the list of the curated lncRNAs in the present study
Fig. 5
Fig. 5
Distribution of exon numbers in the lncRNA/mRNA genes
Fig. 6
Fig. 6
Procedures for discovering novel lncRNAs from RNA-seq data of the present study. The sequencing read datasets of poly(A)-enriched RNA and total RNA were respectively mapped to the reference genome sequence using TopHat and Cufflinks. Putative lncRNAs were then discovered by Cuffcompare, followed by coding potential estimation and rRNA exclusion. Sequencing reads were again mapped to the set of putative lncRNAs to construct the final set of novel lncRNAs
Fig. 7
Fig. 7
Rules for classifying lncRNAs. Black arrows (transcripts) represent coding genes and colored transcripts are lncRNAs. a lncRNAs with intronic overlaps. This group includes lncRNAs (dark green and light green transcripts) located in intronic regions of coding genes (black transcripts). b Intergenic lncRNAs. This group includes lncRNAs (red transcripts) located in regions between two coding genes (black transcripts). c lncRNAs with exonic overlaps. This group includes lncRNAs (dark blue and light blue transcripts) overlapping exonic regions of coding genes (the black transcript)

References

    1. Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009;10:155–9. doi: 10.1038/nrg2521. - DOI - PubMed
    1. Batista PJ, Chang HY. Long noncoding RNAs: cellular address codes in development and disease. Cell. 2013;152:1298–307. doi: 10.1016/j.cell.2013.02.012. - DOI - PMC - PubMed
    1. Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol. 2011;21:354–61. doi: 10.1016/j.tcb.2011.04.001. - DOI - PubMed
    1. Schuettengruber B, Ganapathi M, Leblanc B, Portoso M, Jaschek R, Tolhuis B, et al. Functional anatomy of polycomb and trithorax chromatin landscapes in Drosophila embryos. PLoS Biol. 2009;7:e1000013. doi: 10.1371/journal.pbio.1000013. - DOI - PMC - PubMed
    1. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–37. doi: 10.1016/j.cell.2007.05.009. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources