Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug 28:14:584.
doi: 10.1186/1471-2164-14-584.

Inferring the expression variability of human transposable element-derived exons by linear model analysis of deep RNA sequencing data

Affiliations

Inferring the expression variability of human transposable element-derived exons by linear model analysis of deep RNA sequencing data

Wensheng Zhang et al. BMC Genomics. .

Abstract

Background: The exonization of transposable elements (TEs) has proven to be a significant mechanism for the creation of novel exons. Existing knowledge of the retention patterns of TE exons in mRNAs were mainly established by the analysis of Expressed Sequence Tag (EST) data and microarray data.

Results: This study seeks to validate and extend previous studies on the expression of TE exons by an integrative statistical analysis of high throughput RNA sequencing data. We collected 26 RNA-seq datasets spanning multiple tissues and cancer types. The exon-level digital expressions (indicating retention rates in mRNAs) were quantified by a double normalized measure, called the rescaled RPKM (Reads Per Kilobase of exon model per Million mapped reads). We analyzed the distribution profiles and the variability (across samples and between tissue/disease groups) of TE exon expressions, and compared them with those of other constitutive or cassette exons. We inferred the effects of four genomic factors, including the location, length, cognate TE family and TE nucleotide proportion (RTE, see Methods section) of a TE exon, on the exons' expression level and expression variability. We also investigated the biological implications of an assembly of highly-expressed TE exons.

Conclusion: Our analysis confirmed prior studies from the following four aspects. First, with relatively high expression variability, most TE exons in mRNAs, especially those without exact counterparts in the UCSC RefSeq (Reference Sequence) gene tables, demonstrate low but still detectable expression levels in most tissue samples. Second, the TE exons in coding DNA sequences (CDSs) are less highly expressed than those in 3' (5') untranslated regions (UTRs). Third, the exons derived from chronologically ancient repeat elements, such as MIRs, tend to be highly expressed in comparison with those derived from younger TEs. Fourth, the previously observed negative relationship between the lengths of exons and the inclusion levels in transcripts is also true for exonized TEs. Furthermore, our study resulted in several novel findings. They include: (1) for the TE exons with non-zero expression and as shown in most of the studied biological samples, a high TE nucleotide proportion leads to their lower retention rates in mRNAs; (2) the considered genomic features (i.e. a continuous variable such as the exon length or a category indicator such as 3'UTR) influence the expression level and the expression variability (CV) of TE exons in an inverse manner; (3) not only the exons derived from Alu elements but also the exons from the TEs of other families were preferentially established in zinc finger (ZNF) genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histograms for the digital expression levels of TE exons in sample BT20 (representing cluster G1). The black bar on the left side of each plot represents the proportion of un-expressed exons.
Figure 2
Figure 2
Histograms for the variability and standardized inter-class differences of the digital expression levels of TE exons. CV: the coefficient of variance of the rescaled RPKMs across the 26 samples. BR t-statistic: calculated with the difference and pooled standard deviation of the rescaled RPKMs for three ER- breast cancer cell lines and four ER + breast cancer cell lines. PR t-statistic: calculated with the difference and pooled standard deviation of the rescaled RPKMs for three prostate adenocarcinoma samples and three normal prostate tissue samples.
Figure 3
Figure 3
Distribution of TE exons by the cognate TE families and the inclusion (presence or absence) in the UCSC RefGene table. A number in the upper row is the proportion of the un-annotated TE exons within the corresponding family among the entire set. A number in the lower row is the proportion of the annotated TE exons within a family.
Figure 4
Figure 4
Visualization of the effects of genomic factors on the digital expression of TE exons. The results indicated by the top rows of color boxes are inferred by the median of TE exons’ un-expressed ratio and rescaled RPKMs across all samples. In the analysis, Alu and CDS are set as the baselines for the two categorical factors, location and TE family, respectively.
Figure 5
Figure 5
The inverse relationship between the expression levels and expression variability of TE exons. A: The effects of genomic factors on the middle digital expression level of TE exons (the median of the rescaled RPKMs across the 26 samples). B: The effects of genomic factors on the coefficient of variance for the rescaled RPKM of TE exons across the 26 samples. C: The scatter plot of the median(s) and the coefficient(s) of variance. The TE exons hosted by the genes un-expressed in over half of samples are excluded before the statistical analysis (using Model-2). The height of an error bar in Plots A and B represents the two-time standard error of the corresponding effect coefficient. In Plot C, the correlation was calculated by the Kendall method.

Similar articles

Cited by

References

    1. Shah SH, Pallas JA. Identifying differential exon splicing using linear models and correlation coefficients. BMC Bioinformatics. 2009;10:26. doi: 10.1186/1471-2105-10-26. - DOI - PMC - PubMed
    1. Lin L, Shen S, Tye A, Cai JJ, Jiang P, Davidson BL, Xing Y. Diverse splicing patterns of exonized Alu elements in human tissues. PLoS Genet. 2008;4(10):e1000225. doi: 10.1371/journal.pgen.1000225. - DOI - PMC - PubMed
    1. Sugnet CW, Srinivasan K, Clark TA, O’Brien G, Cline MS, Wang H, Williams A, Kulp D, Blume JE, Haussler D. et al.Unusual intron conservation near tissue-regulated exons found by splicing microarrays. PLoS Comput Biol. 2006;2(1):e4. doi: 10.1371/journal.pcbi.0020004. - DOI - PMC - PubMed
    1. Shen S, Lin L, Cai JJ, Jiang P, Kenkel EJ, Stroik MR, Sato S, Davidson BL, Xing Y. Widespread establishment and regulatory impact of Alu exons in human genes. Proc Natl Acad Sci U S A. 2011;108(7):2837–2842. doi: 10.1073/pnas.1012834108. - DOI - PMC - PubMed
    1. Affymetrix website. http://www.affymetrix.com/estore/browse/products.jsp?productId=131452#1_1.

Publication types

LinkOut - more resources