. 2011 Dec 26:2:93.

doi: 10.3389/fgene.2011.00093. eCollection 2011.

Thousands of Novel Transcripts Identified in Mouse Cerebrum, Testis, and ES Cells Based on ribo-minus RNA Sequencing

Wanfei Liu¹, Yuhui Zhao, Peng Cui, Qiang Lin, Feng Ding, Chengqi Xin, Xinyu Tan, Shuhui Song, Jun Yu, Songnian Hu

Affiliations

PMID: 22303387
PMCID: PMC3268642
DOI: 10.3389/fgene.2011.00093

Thousands of Novel Transcripts Identified in Mouse Cerebrum, Testis, and ES Cells Based on ribo-minus RNA Sequencing

Wanfei Liu et al. Front Genet. 2011.

. 2011 Dec 26:2:93.

doi: 10.3389/fgene.2011.00093. eCollection 2011.

Authors

Wanfei Liu¹, Yuhui Zhao, Peng Cui, Qiang Lin, Feng Ding, Chengqi Xin, Xinyu Tan, Shuhui Song, Jun Yu, Songnian Hu

Affiliation

¹ CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences Beijing, China.

PMID: 22303387
PMCID: PMC3268642
DOI: 10.3389/fgene.2011.00093

Abstract

The high-throughput next-generation sequencing technologies provide an excellent opportunity for the detection of less-abundance transcripts that may not be identifiable by previously available techniques. Here, we report a discovery of thousands of novel transcripts (mostly non-coding RNAs) that are expressed in mouse cerebrum, testis, and embryonic stem (ES) cells, through an in-depth analysis of rmRNA-seq data. These transcripts show significant associations with transcriptional start and elongation signals. At the upstream of these transcripts we observed significant enrichment of histone marks (histone H3 lysine 4 trimethylation, H3K4me3), RNAPII binding sites, and cap analysis of gene expression tags that mark transcriptional start sites. Along the length of these transcripts, we also observed enrichment of histone H3 lysine 36 trimethylation (H3K36me3). Moreover, these transcripts show strong purifying selection in their genomic loci, exonic sequences, and promoter regions, implying functional constraints on the evolution of these transcripts. These results define a collection of novel transcripts in the mouse genome and indicate their potential functions in the mouse tissues and cells.

Keywords: next-generation sequencing; non-coding RNA; novel transcripts; ribo-minus RNA-seq.

PubMed Disclaimer

Figures

**Figure 1**
**The saturation curve for the number of start-points of mapped reads**. x-Axis shows the number of the mapped reads and y-axis displays the start-points number (million) of mapped reads.

**Figure 2**
**Parameters used in exon identification**. **(A)** The cutoff value of coverage in the mouse cerebrum, testis, and ES cells. The cutoff value (blue) of coverage (3, 4, and 7 for cerebrum, ES cells, and testis, respectively) is labeled on the x-axis and the corresponding accumulative frequency (0.95, colored in green) is labeled on the y-axis. **(B)** The minimal intron length used in exon identification. The value (blue) on x-axis is identified as minimal intron length (95) and the value (green) on y-axis is the corresponding accumulative frequency (0.05). **(C)** The minimal exon length used in exon identification. The value (blue) on the x-axis is identified as minimal exon length (55) and the value (green) on the y-axis is the corresponding accumulative frequency (0.05).

**Figure 3**
**Density distribution for distances between adjacent exons in intergenic regions**. x-Axis shows the distance between adjacent exons and y-axis displays the density. There is a small peak appeared around 100 bp length and this peak implies the minimal introns. In addition, the first main peak represents the distance of general adjacent exons inside TUs and the second main peak is related to the distance of exons between adjacent TUs.

**Figure 4**
**5′ CAGE and histone modification around novel TU–TSS or gene bodies in the mouse cerebrum, testis, and ES cell. (A–D) Profiles of 5′ CAGE, H3K4me3, H3K27me3, and H3K36me3**.

**Figure 5**
**The RNAPII around novel TU–TSS and sequence conservation of TU exon and promoters. (A)** Profile of RNAPII, **(B)** cumulative distribution of sequence conservation for TU exon, protein exon, Fantom3 RNA exon, and random region, and **(C)** cumulative distribution of sequence conservation for TU promoter, protein promoter, Fantom3 RNA promoter, and random region.

**Figure 6**
**The correlation between sense and antisense expression ratio in sense–antisense gene pairs**. Red and green points represent the sense–antisense gene pairs in positive and negative types. “P” stands for the positive type and “N” stands for the negative type.

**Figure 7**
**Histograms and motif logo of small RNAs in intergenic regions**. **(A)** the histogram of small RNA length, **(B)** a motif logo of small RNAs in cerebrum (16.13% of 65 bp small RNAs involved in this motif), **(C)** a motif logo of small RNA in testis (41.54% of 64 bp small RNAs involved in this motif), and **(D)** a motif logo of small RNAs in ES cell (25.97% of 56 bp small RNAs involved in this motif).

**Figure A1**
**A flowchart of gene identification process**. We mapped the ribo-minus RNA-seq data using TopHat and created the coverage file for genome and identified exons according to the coverage of each position (>= cutoff value). Since 95% intron lengths are > or =95 bp, we merged small exons (distance < or =95 bp). Moreover, since 95% exon lengths are > or =55, we only keep the exons whose length is equal or larger than 55 bp to reduce false positives. We remove low coverage exons to reduce errors. We also filter known exons and build novel TUs on the basis of H3K36me3, RNAPII, and the different distance of adjacent exons between internal of TUs and adjacent TUs. We evaluate the accuracy of TU building by comparing our TUs with Fantom3 RNAs of intergenic regions.

**Figure A2**
**A snapshot for TUs in Refgene intron and intergenic region**. The upper panel is a TU in an intron of the transmembrane protein gene, Tmem180, and lower panel is a TU adjacent to Sap130 gene. SAP130 is a subunit of the histone deacetylase-dependent SIN3A co-repressor complex which acts as a transcriptional repressor. The TU in plus and minus strands is shown as red and blue horizontal bars, respectively. For each TU, we show RNA expression level (vertical bars in red and blue), identified TU, Refgene, RNAPII signal (green), H3K36me3 signal (purple), and conservation score (yellow).

**Figure A3**
**The distances between (1) motif start and RNA 5′ end and (2) between motif end and RNA 3′ end**. The histogram shows the distance between motif start and RNA 5′ end (left), the distance between motif end and RNA 3′ end (middle), and the density of both (right).

**Figure A4**
**Venn diagram of newly identified non-coding TUs among mouse the cerebrum, testis, and ES cells**.

See this image and copyright information in PMC

Cited by

The pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids.
Zhang Z, Yu J. Zhang Z, et al. Genomics Proteomics Bioinformatics. 2012 Aug;10(4):175-80. doi: 10.1016/j.gpb.2012.08.002. Epub 2012 Aug 11. Genomics Proteomics Bioinformatics. 2012. PMID: 23084772 Free PMC article.
Comparative analyses of H3K4 and H3K27 trimethylations between the mouse cerebrum and testis.
Cui P, Liu W, Zhao Y, Lin Q, Zhang D, Ding F, Xin C, Zhang Z, Song S, Sun F, Yu J, Hu S. Cui P, et al. Genomics Proteomics Bioinformatics. 2012 Apr;10(2):82-93. doi: 10.1016/j.gpb.2012.05.007. Epub 2012 Jun 9. Genomics Proteomics Bioinformatics. 2012. PMID: 22768982 Free PMC article.
Developmental transcriptome analysis of human erythropoiesis.
Shi L, Lin YH, Sierant MC, Zhu F, Cui S, Guan Y, Sartor MA, Tanabe O, Lim KC, Engel JD. Shi L, et al. Hum Mol Genet. 2014 Sep 1;23(17):4528-42. doi: 10.1093/hmg/ddu167. Epub 2014 Apr 29. Hum Mol Genet. 2014. PMID: 24781209 Free PMC article.
Developmental analysis of spliceosomal snRNA isoform expression.
Lu Z, Matera AG. Lu Z, et al. G3 (Bethesda). 2014 Nov 21;5(1):103-10. doi: 10.1534/g3.114.015735. G3 (Bethesda). 2014. PMID: 25416704 Free PMC article.
Life on two tracks.
Yu J. Yu J. Genomics Proteomics Bioinformatics. 2012 Jun;10(3):123-6. doi: 10.1016/j.gpb.2012.06.001. Epub 2012 Jun 23. Genomics Proteomics Bioinformatics. 2012. PMID: 22917184 Free PMC article. No abstract available.

See all "Cited by" articles

References

1. Bailey T. L., Elkan C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 - PubMed
1. Bertone P., Stolc V., Royce T. E., Rozowsky J. S., Urban A. E., Zhu X., Rinn J. L., Tongprasit W., Samanta M., Weissman S., Gerstein M., Snyder M. (2004). Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–224610.1126/science.1103388 - DOI - PubMed
1. Birney E., Stamatoyannopoulos J. A., Dutta A., Guigo R., Gingeras T. R., Margulies E. H., Weng Z., Snyder M., Dermitzakis E. T., Thurman R. E., Kuehn M. S., Taylor C. M., Neph S., Koch C. M., Asthana S., Malhotra A., Adzhubei I., Greenbaum J. A., Andrews R. M., Flicek P., Boyle P. J., Cao H., Carter N. P., Clelland G. K., Davis S., Day N., Dhami P., Dillon S. C., Dorschner M. O., Fiegler H., Giresi P. G., Goldy J., Hawrylycz M., Haydock A., Humbert R., James K. D., Johnson B. E., Johnson E. M., Frum T. T., Rosenzweig E. R., Karnani N., Lee K., Lefebvre G. C., Navas P. A., Neri F., Parker S. C., Sabo P. J., Sandstrom R., Shafer A., Vetrie D., Weaver M., Wilcox S., Yu M., Collins F. S., Dekker J., Lieb J. D., Tullius T. D., Crawford G. E., Sunyaev S., Noble W. S., Dunham I., Denoeud F., Reymond A., Kapranov P., Rozowsky J., Zheng D., Castelo R., Frankish A., Harrow J., Ghosh S., Sandelin A., Hofacker I. L., Baertsch R., Keefe D., Dike S., Cheng J., Hirsch H. A., Sekinger E. A., Lagarde J., Abril J. F., Shahab A., Flamm C., Fried C., Hackermuller J., Hertel J., Lindemeyer M., Missal K., Tanzer A., Washietl S., Korbel J., Emanuelsson O., Pedersen J. S., Holroyd N., Taylor R., Swarbreck D., Matthews N., Dickson M. C., Thomas D. J., Weirauch M. T., Gilbert J., Drenkow J., Bell I., Zhao X., Srinivasan K. G., Sung W. K., Ooi H. S., Chiu K. P., Foissac S., Alioto T., Brent M., Pachter L., Tress M. L., Valencia A., Choo S. W., Choo C. Y., Ucla C., Manzano C., Wyss C., Cheung E., Clark T. G., Brown J. B., Ganesh M., Patel S., Tammana H., Chrast J., Henrichsen C. N., Kai C., Kawai J., Nagalakshmi U., Wu J., Lian Z., Lian J., Newburger P., Zhang X., Bickel P., Mattick J. S., Carninci P., Hayashizaki Y., Weissman S., Hubbard T., Myers R. M., Rogers J., Stadler P. F., Lowe T. M., Wei C. L., Ruan Y., Struhl K., Gerstein M., Antonarakis S. E., Fu Y., Green E. D., Karaöz U., Siepel A., Taylor J., Liefer L. A., Wetterstrand K. A., Good P. J., Feingold E. A., Guyer M. S., Cooper G. M., Asimenos G., Dewey C. N., Hou M., Nikolaev S., Montoya-Burgos J. I., Löytynoja A., Whelan S., Pardi F., Massingham T., Huang H., Zhang N. R., Holmes I., Mullikin J. C., Ureta-Vidal A., Paten B., Seringhaus M., Church D., Rosenbloom K., Kent W. J., Stone E. A., NISC Comparative Sequencing Program, Baylor College of Medicine Human Genome Sequencing Center, Washington University Genome Sequencing Center, Broad Institute; Children’s Hospital Oakland Research Institute. Batzoglou S., Goldman N., Hardison R. C., Haussler D., Miller W., Sidow A., Trinklein N. D., Zhang Z. D., Barrera L., Stuart R., King D. C., Ameur A., Enroth S., Bieda M. C., Kim J., Bhinge A. A., Jiang N., Liu J., Yao F., Vega V. B., Lee C. W., Ng P., Shahab A., Yang A., Moqtaderi Z., Zhu Z., Xu X., Squazzo S., Oberley M. J., Inman D., Singer M. A., Richmond T. A., Munn K. J., Rada-Iglesias A., Wallerman O., Komorowski J., Fowler J. C., Couttet P., Bruce A. W., Dovey O. M., Ellis P. D., Langford C. F., Nix D. A., Euskirchen G., Hartman S., Urban A. E., Kraus P., Van Calcar S., Heintzman N., Kim T. H., Wang K., Qu C., Hon G., Luna R., Glass C. K., Rosenfeld M. G., Aldred S. F., Cooper S. J., Halees A., Lin J. M., Shulha H. P., Zhang X., Xu M., Haidar J. N., Yu Y., Ruan Y., Iyer V. R., Green R. D., Wadelius C., Farnham P. J., Ren B., Harte R. A., Hinrichs A. S., Trumbower H., Clawson H., Hillman-Jackson J., Zweig A. S., Smith K., Thakkapallayil A., Barber G., Kuhn R. M., Karolchik D., Armengol L., Bird C. P., de Bakker P. I., Kern A. D., Lopez-Bigas N., Martin J. D., Stranger B. E., Woodroffe A., Davydov E., Dimas A., Eyras E., Hallgrímsdóttir I. B., Huppert J., Zody M. C., Abecasis G. R., Estivill X., Bouffard G. G., Guan X., Hansen N. F., Idol J. R., Maduro V. V., Maskeri B., McDowell J. C., Park M., Thomas P. J., Young A. C., Blakesley R. W., Muzny D. M., Sodergren E., Wheeler D. A., Worley K. C., Jiang H., Weinstock G. M., Gibbs R. A., Graves T., Fulton R., Mardis E. R., Wilson R. K., Clamp M., Cuff J., Gnerre S., Jaffe D. B., Chang J. L., Lindblad-Toh K., Lander E. S., Koriabine M., Nefedov M., Osoegawa K., Yoshinaga Y., Zhu B., de Jong P. J. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–81610.1038/nature05874 - DOI - PMC - PubMed
1. Cabili M. N., Trapnell C., Goff L., Koziol M., Tazon-Vega B., Regev A., Rinn J. L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–192710.1101/gad.17446611 - DOI - PMC - PubMed
1. Carninci P., Kasukawa T., Katayama S., Gough J., Frith M. C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C., Kodzius R., Shimokawa K., Bajic V. B., Brenner S. E., Batalov S., Forrest A. R., Zavolan M., Davis M. J., Wilming L. G., Aidinis V., Allen J. E., Ambesi-Impiombato A., Apweiler R., Aturaliya R. N., Bailey T. L., Bansal M., Baxter L., Beisel K. W., Bersano T., Bono H., Chalk A. M., Chiu K. P., Choudhary V., Christoffels A., Clutterbuck D. R., Crowe M. L., Dalla E., Dalrymple B. P., De Bono B., Della Gatta G., Di Bernardo D., Down T., Engstrom P., Fagiolini M., Faulkner G., Fletcher C. F., Fukushima T., Furuno M., Futaki S., Gariboldi M., Georgii-Hemming P., Gingeras T. R., Gojobori T., Green R. E., Gustincich S., Harbers M., Hayashi Y., Hensch T. K., Hirokawa N., Hill D., Huminiecki L., Iacono M., Ikeo K., Iwama A., Ishikawa T., Jakt M., Kanapin A., Katoh M., Kawasawa Y., Kelso J., Kitamura H., Kitano H., Kollias G., Krishnan S. P., Kruger A., Kummerfeld S. K., Kurochkin I. V., Lareau L. F., Lazarevic D., Lipovich L., Liu J., Liuni S., Mcwilliam S., Madan Babu M., Madera M., Marchionni L., Matsuda H., Matsuzawa S., Miki H., Mignone F., Miyake S., Morris K., Mottagui-Tabar S., Mulder N., Nakano N., Nakauchi H., Ng P., Nilsson R., Nishiguchi S., Nishikawa S., Piazza S., Reed J., Reid J. F., Ring B. Z., Ringwald M., Rost B., Ruan Y., Salzberg S. L., Sandelin A., Schneider C., Schönbach C., Sekiguchi K., Semple C. A., Seno S., Sessa L., Sheng Y., Shibata Y., Shimada H., Shimada K., Silva D., Sinclair B., Sperling S., Stupka E., Sugiura K., Sultana R., Takenaka Y., Taki K., Tammoja K., Tan S. L., Tang S., Taylor M. S., Tegner J., Teichmann S. A., Ueda H. R., van Nimwegen E., Verardo R., Wei C. L., Yagi K., Yamanishi H., Zabarovsky E., Zhu S., Zimmer A., Hide W., Bult C., Grimmond S. M., Teasdale R. D., Liu E. T., Brusic V., Quackenbush J., Wahlestedt C., Mattick J. S., Hume D. A., Kai C., Sasaki D., Tomaru Y., Fukuda S., Kanamori-Katayama M., Suzuki M., Aoki J., Arakawa T., Iida J., Imamura K., Itoh M., Kato T., Kawaji H., Kawagashira N., Kawashima T., Kojima M., Kondo S., Konno H., Nakano K., Ninomiya N., Nishio T., Okada M., Plessy C., Shibata K., Shiraki T., Suzuki S., Tagami M., Waki K., Watahiki A., Okamura-Oho Y., Suzuki H., Kawai J., Hayashizaki Y., FANTOM Consortium, and RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group) (2005). The transcriptional landscape of the mammalian genome. Science 309, 1559–156310.1126/science.1112014 - DOI - PubMed

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Thousands of Novel Transcripts Identified in Mouse Cerebrum, Testis, and ES Cells Based on ribo-minus RNA Sequencing

Affiliation

Thousands of Novel Transcripts Identified in Mouse Cerebrum, Testis, and ES Cells Based on ribo-minus RNA Sequencing

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous