Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 30;375(1795):20190342.
doi: 10.1098/rstb.2019.0342. Epub 2020 Feb 10.

An atlas of transposable element-derived alternative splicing in cancer

Affiliations

An atlas of transposable element-derived alternative splicing in cancer

Evan A Clayton et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Transposable element (TE)-derived sequences comprise more than half of the human genome, and their presence has been documented to alter gene expression in a number of different ways, including the generation of alternatively spliced transcript isoforms. Alternative splicing has been associated with tumorigenesis for a number of different cancers. The objective of this study was to broadly characterize the role of human TEs in generating alternatively spliced transcript isoforms in cancer. To do so, we screened for the presence of TE-derived sequences co-located with alternative splice sites that are differentially used in normal versus cancer tissues. We analysed a comprehensive set of alternative splice variants characterized for 614 matched normal-tumour tissue pairs across 13 cancer types, resulting in the discovery of 4820 TE-generated alternative splice events distributed among 723 cancer-associated genes. Short interspersed nuclear elements (Alu) and long interspersed nuclear elements (L1) were found to contribute the majority of TE-generated alternative splice sites in cancer genes. A number of cancer-associated genes, including MYH11, WHSC1 and CANT1, were shown to have overexpressed TE-derived isoforms across a range of cancer types. TE-derived isoforms were also linked to cancer-specific fusion transcripts, suggesting a novel mechanism for the generation of transcriptome diversity via trans-splicing mediated by dispersed TE repeats. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.

Keywords: alternative splicing; cancer; gene expression; gene regulation; transposable elements; tumorigenesis.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

Figure 1.
Figure 1.
Bioinformatics analysis pipeline used for this study. RNA-seq datasets from 658 paired normal-tumour TCGA samples from 22 cancer types were analysed in this study. The schematic can be broadly divided into four stages: (row 1) detection of alternative splicing events and per-exon expression quantification, (row 2) identification of TE-derived alternative splicing events for cancer-associated genes, (row 3) statistical testing for differences in alternative splicing expression levels between matched normal and tumour tissues, and (row 4) evaluation of cases of interest to explore the potential functional impact of TE-derived alternative splicing on cancer.
Figure 2.
Figure 2.
Overall landscape of TE-derived alternative splicing in cancer. (a) Dot-and-whisker plot comparing the distributions of TE and non-TE isoforms in cancer-associated genes in normal (blue and light blue) and tumour (red and light-red) tissues across all samples within each cancer type. The median number of events are shown as dots and the outliers (defined classically as 1.5 × interquartile range) are shown as whiskers. Cancer tissue abbreviations are defined in table 1. (b) Counts of the total number of unique TE and non-TE isoforms in cancer-associated genes is shown by the splicing event type and TE class. (c) The observed counts of TE isoforms in cancer-associated genes for each event type and TE class is compared to expected counts.
Figure 3.
Figure 3.
Differential expression of TE-derived alternative splice isoforms in tumour versus normal samples. Distributions of the relative expression counts (REC) comparing TE-derived to non-TE-derived alternative splice isoforms in tumour versus normal samples. The formula for REC is described in the Methods and in the electronic supplementary material, figure S6. Data are shown for 13 cancer types and four alternative splice event types. Each dot represents an REC value derived from the average normalized expression counts of the TE- and non-TE-derived isoforms in normal and cancer samples. Higher expression (counts) of the TE-derived isoform in tumour are shown on the right side of the panels, whereas lower expression is shown to the left.
Figure 4.
Figure 4.
Frequency of TE-derived alternative splice events for individual genes. (a) The total numbers of alternative splice counts per exon are shown for each cancer-associated gene, broken down by the four alternative splice event types. Genes with the highest counts of TE-derived alternative splice events across all cancer types are shown. (b) The location of KLK2 on the long arm of chromosome 19 is shown along with the locations of two TE-derived alternative splicing events. A LINE (L2) sequence generates an internal exon skipping event, and aLINE (L1) generates a terminal exon skipping event.
Figure 5.
Figure 5.
TE-derived alternative splicing in the MYH11 gene. (a) The location of MYH11 on the short arm of chromosome 16 is shown along with the specific location of its TE-derived alternative splicing event. A SINE (Alu) sequence provides an alternate 3′ splice site resulting in an extended exon 41. (b) Distributions of the non-TE (blue) and TE-derived (red) isoforms are shown for matched normal (left) and lung squamous cell carcinoma samples (right). (c) Relative expression change (REC) values are plotted against the corresponding G-test p-values (see Methods and the electronic supplementary material, figure S6) for the matched normal and lung squamous cell carcinoma samples. The MYH11 TE-derived isoform values are shown as a red square.
Figure 6.
Figure 6.
TE-derived alternative splicing in the WHSC1 gene. (a) The location of WHSC1 on the short arm of chromosome 4 is shown along with the specific location of its TE-derived alternative splicing event. A LINE (L1) sequence generates an exon skipping event. (b) Distributions of the non-TE (blue) and TE-derived (red) isoforms are shown for matched normal (left) and stomach adenocarcinoma samples (right). (c) Relative expression change (REC) values are plotted against the corresponding G-test p-values (see Methods and the electronic supplementary material, figure S6) for the matched normal and stomach adenocarcinoma samples. The WHSC1 TE-derived isoform values are shown as a red square.

References

    1. de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. 2011. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7, e1002384 ( 10.1371/journal.pgen.1002384) - DOI - PMC - PubMed
    1. Lander ES, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. ( 10.1038/35057062) - DOI - PubMed
    1. Jordan IK, Rogozin IB, Glazko GV, Koonin EV. 2003. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 19, 68–72. ( 10.1016/S0168-9525(02)00006-9) - DOI - PubMed
    1. van de Lagemaat LN, Landry JR, Mager DL, Medstrand P. 2003. Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet. 19, 530–536. ( 10.1016/j.tig.2003.08.004) - DOI - PubMed
    1. Rebollo R, Romanish MT, Mager DL. 2012. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu. Rev. Genet. 46, 21–42. ( 10.1146/annurev-genet-110711-155621) - DOI - PubMed

Publication types

Substances