Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 26;5(3):zcad040.
doi: 10.1093/narcan/zcad040. eCollection 2023 Sep.

The transcriptional landscape of endogenous retroelements delineates esophageal adenocarcinoma subtypes

Collaborators, Affiliations

The transcriptional landscape of endogenous retroelements delineates esophageal adenocarcinoma subtypes

Anastasiya Kazachenka et al. NAR Cancer. .

Abstract

Most cancer types exhibit aberrant transcriptional activity, including derepression of retrotransposable elements (RTEs). However, the degree, specificity and potential consequences of RTE transcriptional activation may differ substantially among cancer types and subtypes. Representing one extreme of the spectrum, we characterize the transcriptional activity of RTEs in cohorts of esophageal adenocarcinoma (EAC) and its precursor Barrett's esophagus (BE) from the OCCAMS (Oesophageal Cancer Clinical and Molecular Stratification) consortium, and from TCGA (The Cancer Genome Atlas). We found exceptionally high RTE inclusion in the EAC transcriptome, driven primarily by transcription of genes incorporating intronic or adjacent RTEs, rather than by autonomous RTE transcription. Nevertheless, numerous chimeric transcripts straddling RTEs and genes, and transcripts from stand-alone RTEs, particularly KLF5- and SOX9-controlled HERVH proviruses, were overexpressed specifically in EAC. Notably, incomplete mRNA splicing and EAC-characteristic intronic RTE inclusion was mirrored by relative loss of the respective fully-spliced, functional mRNA isoforms, consistent with compromised cellular fitness. Defective RNA splicing was linked with strong transcriptional activation of a HERVH provirus on Chr Xp22.32 and defined EAC subtypes with distinct molecular features and prognosis. Our study defines distinguishable RTE transcriptional profiles of EAC, reflecting distinct underlying processes and prognosis, thus providing a framework for targeted studies.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Increased inclusion of RTEs in the ESCA and STAD transcriptomes. (A) Number of transcripts expressed (≥0.5 TPM) in the indicated cancer (n = 24 per cancer type) or normal tissue samples (n = 2–156 per tissue type). Box plots denote median value and quartiles, whiskers denote 1.5× the interquartile range, and individual points denote outliers. (B) Heatmap of expression of 4844 ESCA-overexpressed transcripts in the same samples as in (A). (C) Heatmap of expression of 4844 ESCA-overexpressed transcripts in extended TCGA EAC and ESCC cohorts and in an additional OCCAMS EAC cohort. (D, E) Overlap of the 4844 ESCA-overexpressed transcripts with RTEs or annotated genes (D) and according to the RTE group (E). (F) Enrichment of the indicated RTE subfamily in the 4844 ESCA-overexpressed transcripts, compared with all assembled transcripts (P values were calculated with Fisher's exact tests).
Figure 2.
Figure 2.
Aberrant RNA splicing in the EAC transcriptome. (A) Schematic representation of the classification of assembled transcripts according to their location relative to the nearest gene body. (B) Proportion of the indicated class of transcript in ESCA-specific and in all assembled transcripts. (C) GENCODE annotated transcripts (Genes), RTEs (Repeats), assembled transcripts, RNA-seq traces of representative TCGA EAC and OCCAMS EAC samples, and ISO-seq traces of the ESCC cell lines KYSE140 and TE5 (PRJNA515570), at the CASP2, TIMM17A and CPSF6 loci.
Figure 3.
Figure 3.
Diagnostic properties of RTE transcriptional inclusion in EAC. (A) Heatmap of expression of 29 EAC-overexpressed diagnostic transcripts in pooled TCGA and OCCAMS EAC samples, OCCAMS BE samples, TCGA ESCC samples, TCGA samples representing 30 other cancer types and pooled TCGA and GTEx normal tissue samples. (B) Heatmap of expression of 8 EAC-overexpressed transcripts that distinguish EAC and BE in TCGA and OCCAMS EAC samples, OCCAMS BE samples and TCGA ESCC samples (left) and correlation coefficients of the expression of these 8 transcripts in EAC samples (right). (C) Correlation of HERVH Xp22.32 and GNGT1-L1PB1 expression (sum TPMs of the two transcripts from each locus) in TCGA and OCCAMS EAC samples. (D) Receiver operating characteristic (ROC) curves of the performance of the sum of the z-scores of the 29 or 8 diagnostic transcripts in the indicated comparison of pooled TCGA and OCCAMS EAC samples, OCCAMS BE samples, and pooled TCGA and GTEx normal tissue samples.
Figure 4.
Figure 4.
Prognostic properties of RTE transcriptional inclusion in EAC. (A) Mean hazard ratios for the 282 of the EAC-specific transcripts that exhibited a significant correlation with EAC survival (P< 0.05 in both OCCAMS EAC cohorts separately; hazard ratio ≥2 or ≤0.5). (B) Proportion of the indicated class of transcript in the 282 prognostic and in all ESCA-specific transcripts. (C) Correlation between exon and intron representation in the EAC transcriptome. Symbols represent individual genes in a representative OCCAMS EAC sample. (D) Correlation of the mean hazard ratios for 204 EAC-specific genes where both exon and intron expression correlated significantly with EAC survival when considered separately (P< 0.05 in both OCCAMS EAC cohorts separately; hazard ratio ≥2 or ≤0.5). (E) Heatmaps of mean p values and mean hazard ratios for prognostic splicing-associated genes and ISGs.
Figure 5.
Figure 5.
Pattern of HERVH expression in esophageal and colon cancers. (A) Heatmaps of mean P values and mean hazard ratios for the indicated HERVH proviruses calculated for survival of each OCCAMS EAC cohorts separately. (B) Heatmaps of expression of the indicated HERVH proviruses in hierarchically clustered samples from the two OCCAMS EAC cohorts. (C) Heatmaps of expression of the indicated HERVH proviruses in hierarchically clustered samples from TCGA EAC and TCGA ESCC. (D) Mean (±SEM) expression of HERVH Xp22.32, HERVH 1p31.3 and HERVH 13q33.3 proviruses in TCGA EAC and TCGA ESCC samples. (E) Heatmap of expression of the indicated HERVH proviruses in hierarchically clustered samples from TCGA COAD. (F) Mean (±SEM) expression of HERVH Xp22.32, HERVH 1p31.3 and HERVH 13q33.3 proviruses in TCGA COAD samples. (G) Heatmap of expression of the indicated HERVH proviruses in hierarchically clustered samples from CCLE cell lines derived from the esophagus, stomach or large intestine.
Figure 6.
Figure 6.
Regulation of individual HERVH proviruses by KLF5 and SOX9. (A) KLF5 ChIP-seq traces (green track) and RNA-seq traces of KLF5 knocked-down (siKLF5) and control EAC cells OE19 (three samples per group) (E-MTAB-8568; E-MTAB-8446) at the HERVH Xp22.32, HERVH 1p31.3 and HERVH 13q33.3 proviruses. (B) KLF5 ChIP-seq traces (green track) and RNA-seq traces of KLF5−/− and control COAD cells HT-55 (three samples per group) (GSE147853; GSE147855) at the HERVH Xp22.32, HERVH 1p31.3 and HERVH 13q33.3 proviruses. Also indicated in (A) and (B) are KLF5 binding sites from the UCSC Genome Browser JASPAR Transcription Factor track. (C) Expression of HERVH Xp22.32 or HERVH-CALB1, relative to expression of HPRT, determined by RT-qPCR in EAC cells OE19 transfected to express SOX9 or KLF5, compared with control untransfected cells. Error bars represent the variation of two independent repeats each with three technical replicates and p values were calculated one way ANOVA with Bonferroni correction for multiple comparisons.
Figure 7.
Figure 7.
Expression of HERVH Xp22.32 in the progression to EAC. (A) Mean (±SEM) expression of HERVH Xp22.32 in GTEx normal esophagus and OCCAMS BE and EAC samples (left) and an independent dataset of normal esophagus and BE and EAC samples (E-MTAB-4054) (middle), and HERVH Xp22.32 expression in paired OCCAMs BE and EAC samples (right). Comparisons of the three types of tissue were carried out with Kruskal-Wallis tests with Dunn's correction for multiple comparisons, and of the paired samples with Wilcoxon matched-pairs signed rank test. (B) Heatmap of expression of 101 genes that were significantly (q < 0.05) differentially expressed between hierarchically clustered OCCAMS BE and EAC subsets according to HERVH Xp22.32 expression (using 1 TPM as the cut-off value to define high and low HERVH Xp22.32 expression).
Figure 8.
Figure 8.
Molecular features of EAC subtypes defined by HERVH Xp22.32 activation. (A) Driver gene alterations that correlated significantly (P< 0.05, q < 0.05) with HERVH Xp22.32 expression by linear regression analyses. (B) Heatmap of expression of 839 ESCA-overexpressed assembled transcripts (top) and annotated exons and introns of 1756 annotated genes (bottom) that correlated significantly (P< 0.05, q < 0.05) with HERVH Xp22.32 expression by linear regression analyses. (C) Functional annotation by gene ontology (GO) of the 1756 genes that correlated significantly with HERVH Xp22.32 expression in (B). (D) Ratios of exon/intron expression in OCCAMS EAC samples with the highest and lowest HERVH Xp22.32 expression (n = 10 per group) for 19 genes where the expression of exons and of intronic transcripts showed an inverse correlation with HERVH Xp22.32 expression. P values were calculated using Student's t-tests. (E) RNA-seq traces of representative OCCAMS EAC samples with high or low HERVH Xp22.32 expression (two samples per group) at the HERVH Xp22.32, ZDHHC20 and PBRM1 loci.

References

    1. Rebollo R., Romanish M.T., Mager D.L. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu. Rev. Genet. 2012; 46:21–42. - PubMed
    1. Wells J.N., Feschotte C. A field guide to eukaryotic transposable elements. Annu. Rev. Genet. 2020; 54:539–561. - PMC - PubMed
    1. Richardson S.R., Doucet A.J., Kopera H.C., Moldovan J.B., Garcia-Perez J.L., Moran J.V. The influence of LINE-1 and SINE retrotransposons on mammalian genomes. Microbiol. Spectr. 2015; 3:MDNA3-0061-2014. - PMC - PubMed
    1. Ishak C.A., De Carvalho D.D. Reactivation of endogenous retroelements in cancer development and therapy. Annu. Rev. Cancer. Biol. 2020; 4:159–176.
    1. Kassiotis G. The immunological conundrum of endogenous retroelements. Annu. Rev. Immunol. 2023; 41:99–125. - PMC - PubMed