Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 26:2:93.
doi: 10.3389/fgene.2011.00093. eCollection 2011.

Thousands of Novel Transcripts Identified in Mouse Cerebrum, Testis, and ES Cells Based on ribo-minus RNA Sequencing

Affiliations

Thousands of Novel Transcripts Identified in Mouse Cerebrum, Testis, and ES Cells Based on ribo-minus RNA Sequencing

Wanfei Liu et al. Front Genet. .

Abstract

The high-throughput next-generation sequencing technologies provide an excellent opportunity for the detection of less-abundance transcripts that may not be identifiable by previously available techniques. Here, we report a discovery of thousands of novel transcripts (mostly non-coding RNAs) that are expressed in mouse cerebrum, testis, and embryonic stem (ES) cells, through an in-depth analysis of rmRNA-seq data. These transcripts show significant associations with transcriptional start and elongation signals. At the upstream of these transcripts we observed significant enrichment of histone marks (histone H3 lysine 4 trimethylation, H3K4me3), RNAPII binding sites, and cap analysis of gene expression tags that mark transcriptional start sites. Along the length of these transcripts, we also observed enrichment of histone H3 lysine 36 trimethylation (H3K36me3). Moreover, these transcripts show strong purifying selection in their genomic loci, exonic sequences, and promoter regions, implying functional constraints on the evolution of these transcripts. These results define a collection of novel transcripts in the mouse genome and indicate their potential functions in the mouse tissues and cells.

Keywords: next-generation sequencing; non-coding RNA; novel transcripts; ribo-minus RNA-seq.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The saturation curve for the number of start-points of mapped reads. x-Axis shows the number of the mapped reads and y-axis displays the start-points number (million) of mapped reads.
Figure 2
Figure 2
Parameters used in exon identification. (A) The cutoff value of coverage in the mouse cerebrum, testis, and ES cells. The cutoff value (blue) of coverage (3, 4, and 7 for cerebrum, ES cells, and testis, respectively) is labeled on the x-axis and the corresponding accumulative frequency (0.95, colored in green) is labeled on the y-axis. (B) The minimal intron length used in exon identification. The value (blue) on x-axis is identified as minimal intron length (95) and the value (green) on y-axis is the corresponding accumulative frequency (0.05). (C) The minimal exon length used in exon identification. The value (blue) on the x-axis is identified as minimal exon length (55) and the value (green) on the y-axis is the corresponding accumulative frequency (0.05).
Figure 3
Figure 3
Density distribution for distances between adjacent exons in intergenic regions. x-Axis shows the distance between adjacent exons and y-axis displays the density. There is a small peak appeared around 100 bp length and this peak implies the minimal introns. In addition, the first main peak represents the distance of general adjacent exons inside TUs and the second main peak is related to the distance of exons between adjacent TUs.
Figure 4
Figure 4
5′ CAGE and histone modification around novel TU–TSS or gene bodies in the mouse cerebrum, testis, and ES cell. (A–D) Profiles of 5′ CAGE, H3K4me3, H3K27me3, and H3K36me3.
Figure 5
Figure 5
The RNAPII around novel TU–TSS and sequence conservation of TU exon and promoters. (A) Profile of RNAPII, (B) cumulative distribution of sequence conservation for TU exon, protein exon, Fantom3 RNA exon, and random region, and (C) cumulative distribution of sequence conservation for TU promoter, protein promoter, Fantom3 RNA promoter, and random region.
Figure 6
Figure 6
The correlation between sense and antisense expression ratio in sense–antisense gene pairs. Red and green points represent the sense–antisense gene pairs in positive and negative types. “P” stands for the positive type and “N” stands for the negative type.
Figure 7
Figure 7
Histograms and motif logo of small RNAs in intergenic regions. (A) the histogram of small RNA length, (B) a motif logo of small RNAs in cerebrum (16.13% of 65 bp small RNAs involved in this motif), (C) a motif logo of small RNA in testis (41.54% of 64 bp small RNAs involved in this motif), and (D) a motif logo of small RNAs in ES cell (25.97% of 56 bp small RNAs involved in this motif).
Figure A1
Figure A1
A flowchart of gene identification process. We mapped the ribo-minus RNA-seq data using TopHat and created the coverage file for genome and identified exons according to the coverage of each position (>= cutoff value). Since 95% intron lengths are > or =95 bp, we merged small exons (distance < or =95 bp). Moreover, since 95% exon lengths are > or =55, we only keep the exons whose length is equal or larger than 55 bp to reduce false positives. We remove low coverage exons to reduce errors. We also filter known exons and build novel TUs on the basis of H3K36me3, RNAPII, and the different distance of adjacent exons between internal of TUs and adjacent TUs. We evaluate the accuracy of TU building by comparing our TUs with Fantom3 RNAs of intergenic regions.
Figure A2
Figure A2
A snapshot for TUs in Refgene intron and intergenic region. The upper panel is a TU in an intron of the transmembrane protein gene, Tmem180, and lower panel is a TU adjacent to Sap130 gene. SAP130 is a subunit of the histone deacetylase-dependent SIN3A co-repressor complex which acts as a transcriptional repressor. The TU in plus and minus strands is shown as red and blue horizontal bars, respectively. For each TU, we show RNA expression level (vertical bars in red and blue), identified TU, Refgene, RNAPII signal (green), H3K36me3 signal (purple), and conservation score (yellow).
Figure A3
Figure A3
The distances between (1) motif start and RNA 5′ end and (2) between motif end and RNA 3′ end. The histogram shows the distance between motif start and RNA 5′ end (left), the distance between motif end and RNA 3′ end (middle), and the density of both (right).
Figure A4
Figure A4
Venn diagram of newly identified non-coding TUs among mouse the cerebrum, testis, and ES cells.

Similar articles

Cited by

References

    1. Bailey T. L., Elkan C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 - PubMed
    1. Bertone P., Stolc V., Royce T. E., Rozowsky J. S., Urban A. E., Zhu X., Rinn J. L., Tongprasit W., Samanta M., Weissman S., Gerstein M., Snyder M. (2004). Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–224610.1126/science.1103388 - DOI - PubMed
    1. Birney E., Stamatoyannopoulos J. A., Dutta A., Guigo R., Gingeras T. R., Margulies E. H., Weng Z., Snyder M., Dermitzakis E. T., Thurman R. E., Kuehn M. S., Taylor C. M., Neph S., Koch C. M., Asthana S., Malhotra A., Adzhubei I., Greenbaum J. A., Andrews R. M., Flicek P., Boyle P. J., Cao H., Carter N. P., Clelland G. K., Davis S., Day N., Dhami P., Dillon S. C., Dorschner M. O., Fiegler H., Giresi P. G., Goldy J., Hawrylycz M., Haydock A., Humbert R., James K. D., Johnson B. E., Johnson E. M., Frum T. T., Rosenzweig E. R., Karnani N., Lee K., Lefebvre G. C., Navas P. A., Neri F., Parker S. C., Sabo P. J., Sandstrom R., Shafer A., Vetrie D., Weaver M., Wilcox S., Yu M., Collins F. S., Dekker J., Lieb J. D., Tullius T. D., Crawford G. E., Sunyaev S., Noble W. S., Dunham I., Denoeud F., Reymond A., Kapranov P., Rozowsky J., Zheng D., Castelo R., Frankish A., Harrow J., Ghosh S., Sandelin A., Hofacker I. L., Baertsch R., Keefe D., Dike S., Cheng J., Hirsch H. A., Sekinger E. A., Lagarde J., Abril J. F., Shahab A., Flamm C., Fried C., Hackermuller J., Hertel J., Lindemeyer M., Missal K., Tanzer A., Washietl S., Korbel J., Emanuelsson O., Pedersen J. S., Holroyd N., Taylor R., Swarbreck D., Matthews N., Dickson M. C., Thomas D. J., Weirauch M. T., Gilbert J., Drenkow J., Bell I., Zhao X., Srinivasan K. G., Sung W. K., Ooi H. S., Chiu K. P., Foissac S., Alioto T., Brent M., Pachter L., Tress M. L., Valencia A., Choo S. W., Choo C. Y., Ucla C., Manzano C., Wyss C., Cheung E., Clark T. G., Brown J. B., Ganesh M., Patel S., Tammana H., Chrast J., Henrichsen C. N., Kai C., Kawai J., Nagalakshmi U., Wu J., Lian Z., Lian J., Newburger P., Zhang X., Bickel P., Mattick J. S., Carninci P., Hayashizaki Y., Weissman S., Hubbard T., Myers R. M., Rogers J., Stadler P. F., Lowe T. M., Wei C. L., Ruan Y., Struhl K., Gerstein M., Antonarakis S. E., Fu Y., Green E. D., Karaöz U., Siepel A., Taylor J., Liefer L. A., Wetterstrand K. A., Good P. J., Feingold E. A., Guyer M. S., Cooper G. M., Asimenos G., Dewey C. N., Hou M., Nikolaev S., Montoya-Burgos J. I., Löytynoja A., Whelan S., Pardi F., Massingham T., Huang H., Zhang N. R., Holmes I., Mullikin J. C., Ureta-Vidal A., Paten B., Seringhaus M., Church D., Rosenbloom K., Kent W. J., Stone E. A., NISC Comparative Sequencing Program, Baylor College of Medicine Human Genome Sequencing Center, Washington University Genome Sequencing Center, Broad Institute; Children’s Hospital Oakland Research Institute. Batzoglou S., Goldman N., Hardison R. C., Haussler D., Miller W., Sidow A., Trinklein N. D., Zhang Z. D., Barrera L., Stuart R., King D. C., Ameur A., Enroth S., Bieda M. C., Kim J., Bhinge A. A., Jiang N., Liu J., Yao F., Vega V. B., Lee C. W., Ng P., Shahab A., Yang A., Moqtaderi Z., Zhu Z., Xu X., Squazzo S., Oberley M. J., Inman D., Singer M. A., Richmond T. A., Munn K. J., Rada-Iglesias A., Wallerman O., Komorowski J., Fowler J. C., Couttet P., Bruce A. W., Dovey O. M., Ellis P. D., Langford C. F., Nix D. A., Euskirchen G., Hartman S., Urban A. E., Kraus P., Van Calcar S., Heintzman N., Kim T. H., Wang K., Qu C., Hon G., Luna R., Glass C. K., Rosenfeld M. G., Aldred S. F., Cooper S. J., Halees A., Lin J. M., Shulha H. P., Zhang X., Xu M., Haidar J. N., Yu Y., Ruan Y., Iyer V. R., Green R. D., Wadelius C., Farnham P. J., Ren B., Harte R. A., Hinrichs A. S., Trumbower H., Clawson H., Hillman-Jackson J., Zweig A. S., Smith K., Thakkapallayil A., Barber G., Kuhn R. M., Karolchik D., Armengol L., Bird C. P., de Bakker P. I., Kern A. D., Lopez-Bigas N., Martin J. D., Stranger B. E., Woodroffe A., Davydov E., Dimas A., Eyras E., Hallgrímsdóttir I. B., Huppert J., Zody M. C., Abecasis G. R., Estivill X., Bouffard G. G., Guan X., Hansen N. F., Idol J. R., Maduro V. V., Maskeri B., McDowell J. C., Park M., Thomas P. J., Young A. C., Blakesley R. W., Muzny D. M., Sodergren E., Wheeler D. A., Worley K. C., Jiang H., Weinstock G. M., Gibbs R. A., Graves T., Fulton R., Mardis E. R., Wilson R. K., Clamp M., Cuff J., Gnerre S., Jaffe D. B., Chang J. L., Lindblad-Toh K., Lander E. S., Koriabine M., Nefedov M., Osoegawa K., Yoshinaga Y., Zhu B., de Jong P. J. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–81610.1038/nature05874 - DOI - PMC - PubMed
    1. Cabili M. N., Trapnell C., Goff L., Koziol M., Tazon-Vega B., Regev A., Rinn J. L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–192710.1101/gad.17446611 - DOI - PMC - PubMed
    1. Carninci P., Kasukawa T., Katayama S., Gough J., Frith M. C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C., Kodzius R., Shimokawa K., Bajic V. B., Brenner S. E., Batalov S., Forrest A. R., Zavolan M., Davis M. J., Wilming L. G., Aidinis V., Allen J. E., Ambesi-Impiombato A., Apweiler R., Aturaliya R. N., Bailey T. L., Bansal M., Baxter L., Beisel K. W., Bersano T., Bono H., Chalk A. M., Chiu K. P., Choudhary V., Christoffels A., Clutterbuck D. R., Crowe M. L., Dalla E., Dalrymple B. P., De Bono B., Della Gatta G., Di Bernardo D., Down T., Engstrom P., Fagiolini M., Faulkner G., Fletcher C. F., Fukushima T., Furuno M., Futaki S., Gariboldi M., Georgii-Hemming P., Gingeras T. R., Gojobori T., Green R. E., Gustincich S., Harbers M., Hayashi Y., Hensch T. K., Hirokawa N., Hill D., Huminiecki L., Iacono M., Ikeo K., Iwama A., Ishikawa T., Jakt M., Kanapin A., Katoh M., Kawasawa Y., Kelso J., Kitamura H., Kitano H., Kollias G., Krishnan S. P., Kruger A., Kummerfeld S. K., Kurochkin I. V., Lareau L. F., Lazarevic D., Lipovich L., Liu J., Liuni S., Mcwilliam S., Madan Babu M., Madera M., Marchionni L., Matsuda H., Matsuzawa S., Miki H., Mignone F., Miyake S., Morris K., Mottagui-Tabar S., Mulder N., Nakano N., Nakauchi H., Ng P., Nilsson R., Nishiguchi S., Nishikawa S., Piazza S., Reed J., Reid J. F., Ring B. Z., Ringwald M., Rost B., Ruan Y., Salzberg S. L., Sandelin A., Schneider C., Schönbach C., Sekiguchi K., Semple C. A., Seno S., Sessa L., Sheng Y., Shibata Y., Shimada H., Shimada K., Silva D., Sinclair B., Sperling S., Stupka E., Sugiura K., Sultana R., Takenaka Y., Taki K., Tammoja K., Tan S. L., Tang S., Taylor M. S., Tegner J., Teichmann S. A., Ueda H. R., van Nimwegen E., Verardo R., Wei C. L., Yagi K., Yamanishi H., Zabarovsky E., Zhu S., Zimmer A., Hide W., Bult C., Grimmond S. M., Teasdale R. D., Liu E. T., Brusic V., Quackenbush J., Wahlestedt C., Mattick J. S., Hume D. A., Kai C., Sasaki D., Tomaru Y., Fukuda S., Kanamori-Katayama M., Suzuki M., Aoki J., Arakawa T., Iida J., Imamura K., Itoh M., Kato T., Kawaji H., Kawagashira N., Kawashima T., Kojima M., Kondo S., Konno H., Nakano K., Ninomiya N., Nishio T., Okada M., Plessy C., Shibata K., Shiraki T., Suzuki S., Tagami M., Waki K., Watahiki A., Okamura-Oho Y., Suzuki H., Kawai J., Hayashizaki Y., FANTOM Consortium, and RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group) (2005). The transcriptional landscape of the mammalian genome. Science 309, 1559–156310.1126/science.1112014 - DOI - PubMed