Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 20:15:166.
doi: 10.1186/s12862-015-0437-7.

Evolution of the unspliced transcriptome

Affiliations

Evolution of the unspliced transcriptome

Jan Engelhardt et al. BMC Evol Biol. .

Abstract

Background: Despite their abundance, unspliced EST data have received little attention as a source of information on non-coding RNAs. Very little is know, therefore, about the genomic distribution of unspliced non-coding transcripts and their relationship with the much better studied regularly spliced products. In particular, their evolution has remained virtually unstudied.

Results: We systematically study the evidence on unspliced transcripts available in EST annotation tracks for human and mouse, comprising 104,980 and 66,109 unspliced EST clusters, respectively. Roughly one third of these are located totally inside introns of known genes (TINs) and another third overlaps exonic regions (PINs). Eleven percent are "intergenic", far away from any annotated gene. Direct evidence for the independent transcription of many PINs and TINs is obtained from CAGE tag and chromatin data. We predict more than 2000 3'UTR-associated RNA candidates for each human and mouse. Fifteen to twenty percent of the unspliced EST cluster are conserved between human and mouse. With the exception of TINs, the sequences of unspliced EST clusters evolve significantly slower than genomic background. Furthermore, like spliced lincRNAs, they show highly tissue-specific expression patterns.

Conclusions: Unspliced long non-coding RNAs are an important, rapidly evolving, component of mammalian transcriptomes. Their analysis is complicated by their preferential association with complex transcribed loci that usually also harbor a plethora of spliced transcripts. Unspliced EST data, although typically disregarded in transcriptome analysis, can be used to gain insights into this rarely investigated transcriptome component. The frequently postulated connection between lack of splicing and nuclear retention and the surprising overlap of chromatin-associated transcripts suggests that this class of transcripts might be involved in chromatin organization and possibly other mechanisms of epigenetic control.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Classification of unspliced EST cluster w.r.t. their location relative to RefSeq genes. With the exception of totally intronic RNAs (TINs) and cluster in the upstream (UT) and downstream (DT) region within 5 kb, all other classes partially overlap RefSeq exons: 5’ and 3’ partially intronic RNAs (5’PIN, 3’PIN), EST cluster overlapping 3’UTR and downstream region (3’R) or 5’UTR and upstream region (5’R), resp., and cluster covering complete introns indicating retained introns (rI) are distinguished in the statistical analysis. Furthermore, we record totally exonic cluster (TEX) and the intergenic clusters (IGR) that are unrelated to RefSeq loci. The bar plots above and below the scheme summarize the numbers of unspliced EST for each cluster type in human (above) and mouse (below). The Venn diagram below lists the exact numbers. About one fifth of the unspliced EST clusters (21,022 in human and 11,179 in mouse) cannot be classified unambiguously because they are overlapped by more than one RefSeq gene and would fall into different classes with respect to these, see subsection Classification in the Methods part for details. These ambiguous clusters are not included here
Fig. 2
Fig. 2
Using the pairwise alignments of human and mouse we could detect 14,396 pairs of unspliced EST cluster which are conserved. The pairs consist of 13,278 different cluster from human and 13,277 from mouse. Five thousand, three hundred and eighty-eight pairs are between cluster which are classified in the same class, see Fig. 1 for details about classification. 91,431 (87 %) of 104,980 unspliced EST cluster in human and 52,495 (80 %) of 66,109 in mouse can not be associated with an homologous unspliced EST cluster in the other species
Fig. 3
Fig. 3
Mean phastCons scores for the different classes of uEST clusters (squares), compared to the average conservation of introns, the entire genome and (predominantly coding) exons. AVG is the average of all uEST classes. TEX uESTs are particularly heterogeneous, hence they were also subdivided into three subclasses indicated as triangles (completely in CDS, partially overlapping CDS, and entirely non-coding from upper right to lower left). Blue stars refer to a background set comprising only those RefSeq genes in which we detected uEST clusters

Similar articles

Cited by

References

    1. The ENCODE Project Consortium Identification and analysis of functional elements in 1 % of the human genome by the ENCODE pilot project. Nature. 2007;447:799–16. - PMC - PubMed
    1. Maeda N, Kasukawa T, Oyama R, Gough J, Frith M, Engström PG, et al. Transcript annotation in FANTOM3: Mouse Gene Catalog based on physical cDNAs. PLoS Genet. 2006;2:62. - PMC - PubMed
    1. Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ, Rinn JL, et al. The reality of pervasive transcription. PLoS Biol. 2011;9:1000625. - PMC - PubMed
    1. The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. - PMC - PubMed
    1. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol. 2010;28:503–10. - PMC - PubMed

Publication types

LinkOut - more resources