Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 5;3(3):23.
doi: 10.3390/ncrna3030023.

Rare Splice Variants in Long Non-Coding RNAs

Affiliations

Rare Splice Variants in Long Non-Coding RNAs

Rituparno Sen et al. Noncoding RNA. .

Abstract

Long non-coding RNAs (lncRNAs) form a substantial component of the transcriptome and are involved in a wide variety of regulatory mechanisms. Compared to protein-coding genes, they are often expressed at low levels and are restricted to a narrow range of cell types or developmental stages. As a consequence, the diversity of their isoforms is still far from being recorded and catalogued in its entirety, and the debate is ongoing about what fraction of non-coding RNAs truly conveys biological function rather than being "junk". Here, using a collection of more than 100 transcriptomes from related B cell lymphoma, we show that lncRNA loci produce a very defined set of splice variants. While some of them are so rare that they become recognizable only in the superposition of dozens or hundreds of transcriptome datasets and not infrequently include introns or exons that have not been included in available genome annotation data, there is still a very limited number of processing products for any given locus. The combined depth of our sequencing data is large enough to effectively exhaust the isoform diversity: the overwhelming majority of splice junctions that are observed at all are represented by multiple junction-spanning reads. We conclude that the human transcriptome produces virtually no background of RNAs that are processed at effectively random positions, but is-under normal circumstances-confined to a well defined set of splice variants.

Keywords: GENCODE; lncRNA; lncRNA isoforms; splice junctions.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Saturation curves for the number introns as a function of the number of independent transcriptome samples. The lncRNAs data refer to the 1441 annotated genes in the lymphome dataset with at least one intron.
Figure 2
Figure 2
Scatterplots for different numbers of expression bins for long intergenic non-coding RNAs (lincRNAs) and coding genes. The diagonal, where x=y, is marked by a line. Points above the line are those genes for which we calculate more introns compared to GENCODE v.19. Only genes with at least one intron supported by at least 10 reads are considered here. The right-most column displays the fraction of genes that show more (red), the same (blue), or fewer (green) distinct splice junctions in the lymphoma data compared to GENCODE v.19. For the coding genes, there is a clear dependence of these fractions on the expression level: for highly expressed mRNAs, we systematically predict more (rare) splice variants. For mRNAs that are very lowly expressed in the lymphoma data set, GENCODE v.19 has more complex gene models. Overall, there are still more introns in our data set than annotated (Wilcoxon test p<4×1010). In contrast, we systematically see more introns in lincRNAs than annotated by GENCODE (Wilcoxon test p<3×1016), independent of the expression level. An alternative presentation of the r.h.s. panels showing data binned in 5-percentiles can be found in the Supplementary Material. RPKM: reads per kilobase and million reads.
Figure 3
Figure 3
Two examples with previously unannotated splice junctions and introns. (Top) In ENSG00000267939, we find six introns and two additional exons compared to a single intron described in GENCODE v19. (Below) For ENSG00000263470 we find eight introns plus a likely false positive compared to two introns in GENCODE.

Similar articles

Cited by

References

    1. Clark M.B., Amaral P.P., Schlesinger F.J., Dinger M.E., Taft R.J., Rinn J.L., Ponting C.P., Stadler P.F., Morris K.J., Morillon A., et al. The reality of pervasive transcription. PLoS Biol. 2011;9:e1000625. doi: 10.1371/journal.pbio.1000625. - DOI - PMC - PubMed
    1. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. - PMC - PubMed
    1. Li Z., Yu X., Shen J. ANRIL: A pivotal tumor suppressor long non-coding RNA in human cancers. Tumour Biol. 2016;37:5657–5661. doi: 10.1007/s13277-016-4808-5. - DOI - PubMed
    1. Aguilo F., Di Cecilia S., Walsh M.J. Long Non-coding RNA ANRIL and Polycomb in Human Cancers and Cardiovascular Disease. Curr. Top. Microbiol. Immunol. 2016;394:29–39. - PMC - PubMed
    1. Yu X., Zheng H., Chan M.T., Wu W.K. HULC: An oncogenic long non-coding RNA in human cancer. J. Cell. Mol. Med. 2017;21:410–417. doi: 10.1111/jcmm.12956. - DOI - PMC - PubMed

LinkOut - more resources