Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun:116:105727.
doi: 10.1016/j.ebiom.2025.105727. Epub 2025 May 16.

Deciphering human endogenous retrovirus expression in colorectal cancers: exploratory analysis regarding prognostic value in liver metastases

Affiliations

Deciphering human endogenous retrovirus expression in colorectal cancers: exploratory analysis regarding prognostic value in liver metastases

Julien Viot et al. EBioMedicine. 2025 Jun.

Abstract

Background: Human Endogenous RetroVirus (HERV) expression in tumours reflects epigenetic dysregulation of cancer and an oncogenic factor through promoter/enhancer action on genes. While more than 50% of colorectal cancers develop liver metastases, HERV has not been studied in this context.

Methods: We collected 400 RNA-seq samples from over 200 patients with primary and liver metastases, including public data and a novel set of 200 samples.

Findings: We observed global stability of HERV expression between liver metastases and primary colorectal cancers, suggesting an early oncogenic footprint. We identified a list of 17 HERV loci for liver metastatic colorectal cancer (lmCRC) characterization; with tumour-specificity validated in single-cell metastatic colorectal cancer data and normal tissue bulk RNA-seq. Eleven loci produced HERV-derived peptides as per tandem mass spectrometry from primary colorectal cancer. Six loci were associated with the risk of relapse after lmCRC surgery. Four, HERVH_Xp22.32a, HERVH_20p11.23b, HERVH_13q33.3, HERVH_13q31.3, had adverse prognostic value (log-rank p-value 0.028, 0.0083, 9e-4, 0.05, respectively) while two, HERVH_Xp22.2c (log-rank p-value 0.032) and HERVH_8q21.3b (in multivariable models) were associated with better prognosis. Moreover, the markers showed a cumulative effect on survival when expressed. Some were associated with decreased cytotoxic immune cells and most of them correlated with cell cycle pathways.

Interpretation: These findings provide insights into the lmCRC transcriptome landscape by suggesting prognostic markers and potential therapeutic targets.

Funding: This work was supported by funding from institutional grants from Inserm, EFS, University of Bourgogne Franche-Comté, national found "Agence Nationale de la Recherche - ANR-JCJC: Projet HERIC and ANR-22-CE45-0007", and "La ligue contre le cancer".

Keywords: Colorectal cancer; Endogenous retrovirus; Immunology; Liver metastases; Transposable elements.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests Pierre Laurent-Puig is chairman of Ile-De-France Canceropole and declare stock option in MethysDx, and Consulting fees from Pierre Fabre, Servier, Blocartis and BMS. Aurélien De Reynies declare consulting fees from Qlucore as Member of the SAB. Thierry André reports attending advisory board meetings and receiving consulting fees from, Aptitude health, Bristol Myers Squibb, Gritstone Oncology, Gilead, GlaxoSmithKline, Merck & Co. Inc., Nimbus, Nordic, Seagen, Servier, Pfizer and Takeda. Reports honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events from Bristol Myers Squibb, Merck & Co. Inc; Merck Serono, Seagen, and Servier. Support for attending meetings and/or travel from Bristol Myers Squibb and, Merck & Co. Inc and Takeda. Participation on a Data Safety Monitoring Board or Advisory Board for Inspirna. President of ARCAD Foundation. Dewi Vernerey reports consulting fees from OSE Immunotherapeutics, Janssen-Cilag, HalioDx, Pfizer, cellprothera, GERCOR, INCYTE, FSK, INVECTYS, AC Biotech, Veracyte, CURE51, Apmonia Therapeutics. Christophe Borg reports Grants from Bayer, Boehringer, Roche, Molecular partner, Payement for expert testimony from Molecular partner, support for attending meeting from Takeda and MSD, Participation on a Data Safety Monitoring Board from Sanofi. Other authors declare no competing interest related to this study.

Figures

Fig. 1
Fig. 1
TE differentiate colon, cancer, and liver metastasis. a) Chart showing datasets included, along with their sample type and the RNA-seq technology used to sequence the samples. The number of samples is shown below the cohort name or accession number, if publicly available, and the percentage of the total cohort is shown in parentheses. Half of the global cohort is paired-end RNA-seq and half is 3' single-end RNA-seq. b) Distribution of TE family expression across sample types. RC: rolling circle. c) Distribution of subfamily expression within the HERV/LTR family. d) UMAP based on TE expression for all samples in the meta-cohort. Samples are coloured according to sample type. The left panel uses the entire TE expression matrix, while the right panel uses LTR family only. e) Volcano plot for differential expression between normal and tumour tissues. Left blue labels are under-expressed in tumours compared to normal tissues. Right red labels are considered over-expressed in tumours compared to normal tissues. f) Distribution of expression of previously identified targets in the present cohort. For both differential expression analyses, all paired-end normal and tumour or primary and metastatic colon samples were normalized and analysed by DEseq2 on raw counts with batch effect reduction in the design. TE subfamilies are considered significant if log2foldchange >1, base mean > 10, and p-value < 0.001. For all plots, counts are normalized to counts per million (CPM) based on the gene size library. The p-values representing the significance levels of the Wilcoxon rank-sum test are displayed above each category, with normal colon serving as the reference.
Fig. 2
Fig. 2
Locus specific HERV expression in colorectal carcinoma. a) Expression distribution of the 14,968 HERV loci across sample types. Counts are normalized to counts per million (CPM) based on the gene size library. Focus on the fifty most highly expressed HERVH loci. b) Distribution of HERV total expression (considered as HERV burden) per sample according to sample type. Observations outside the interquartile range are considered outliers and are marked as individual points in the plot. The p-values representing the significance levels of the Wilcoxon rank-sum test are displayed above each category. c) HERV total expression per chromosome, for each sample type. The expression range is from 0 to 300 CPM. d) Venn diagram of expressed HERV loci in samples. Only loci expressed above 1 CPM in >10% of tumours are shown. e) Confusion matrix of the combined datasets (all but the training one) used to test the Elastic Net model trained to categorize normal and tumour tissue on SRP029880. TN: True Negative, FN: False Negative, FP: False Positive, TP: True Positive. “Percent correct” represents the accuracy of the model. f) ROC curve of the previous model. g) Distributions of log-transformed total HERV counts per cell for each tumour component from a single-cell RNA-seq atlas of liver metastatic colorectal cancer. 28,266 cells were classified based on gene expression into T cells (n = 18,262), Plasma cells (n = 2454), Myeloid cells (n = 4082), B cells (n = 1615), Cancer cells (n = 902), pDCs (n = 289), Cancer-associated Fibroblast (CAF) (n = 443), and Mast cells (n = 219). h) Positional representation of differentially expressed HERVs. Each dot represents the log2 fold-change of normal versus tumour expression of a single HERV locus, with values ranging from 4 to −4. Coloured dots are for significantly differential loci (blue: under-expressed, red: overexpressed), based on DEseq2 criteria: padj < 0.001 and abs (log2(Fold Change) > 1) and mean CPM >10. For all plots, counts are normalized to counts per million (CPM) based on the number of reads aligned on genes.
Fig. 3
Fig. 3
ERV-derived RNAs have prognostic value after surgery for liver metastases. a) Venn diagram for HERV locus significantly different from independent and full meta-cohort differential expression analysis of normal versus tumour tissues. Locus is considered significant if log2 (fold change) > 1, mean CPM >10, and p-value < 0.001. b) Count distribution of proposed colorectal cancer biomarker HERV elements. Selected elements have very high differential expression (log2FC > 4 and absolute mean difference >2) between normal and tumour tissues, or positive differential expression confirmed in at least two independent cohorts and strong absolute expression difference (>10 CPM), or positive differential expression with identified protein expression. Observations outside the interquartile range are considered outliers and are marked as individual points in the plot. c) Expression of the above HERV loci in GTEx normal colon (blue, 48 samples) and liver (green, 14 samples). HERV Expression was quantified using the reference-free software Reindeer on 1137 samples from the GTEx database using HERV DNA sequences extracted from DFAM.org. d) Distribution of the relative RT-PCR quantification of 5 selected HERV loci in 59 independent samples. Normal colon (n = 16), normal liver (n = 10), primary colon (n = 18), and liver metastasis (n = 15). e) Kaplan–Meier curves for HERV loci associated with overall survival after surgical resection of liver metastasis in univariate survival analysis and multivariable lasso cox model. f) Multivariable Cox model for overall survival after surgical resection of liver metastases incorporating previous HERV prognostic markers and available clinical variables. Multiples imputations of missing data. Right: p-value, middle: hazard ratio and 95% confidence interval. g) Kaplan–Meier curve showing the cumulative prognostic value of poor prognostic markers. The number of each category represents the sum of the poor prognostic markers presented in E expressed for each individual. h) Kaplan–Meier curve showing the cumulative prognostic value of good prognostic markers. The number of each category represents the sum of good prognostic markers shown in E expressed for each individual. In e, g and h, The shaded area refers to the 95% confidence band. For all plots, counts are normalized to counts per million (CPM) based on the gene size library. Normal colon samples are blue, normal liver samples are dark green, primary colorectal samples are brown, and liver metastasis samples are orange. For all Kaplan–Meier curves, the p-value of the log-rank is shown on the graph.
Fig. 4
Fig. 4
ERV biomarkers of metastatic colorectal cancer are associated with the activation of cell cycle pathways and a decrease in the cytotoxic immune environment. a) Association of MsigDB Hallmark pathways with HERV loci considered as prognostic markers. The gene set enrichment score was calculated by ranking Spearman's correlation score between the HERVH locus and gene expression. Enrichments with p-value < 0.05 were considered as significant (coloured in heatmap). Annotations are provided for prognosis (columns) and pathway class (rows). b) T cell score from immune deconvolution with MCP-counter for target loci. Samples are categorized according to whether the locus is expressed (red) or not (blue). Locus is considered expressed if over 1 CPM in the sample. p-values are from Wilcoxon tests. c) Same as b with NK cell score.
Fig. 5
Fig. 5
The good prognostic marker HERVH_8q21.3b presents co-expression with the CALB1 gene. a) Heatmap showing the association between prognostic HERV loci and genomic annotations. Annotations (columns) represent genomic features at the locus positions. b) Correlation between expression of HERVH_20p11.23b and gene RIN2 (CPM counts). R: Pearson's correlation coefficient. c) Correlation between expression of HERVH_8q21.3b and gene CALB1 (CPM counts). R: Pearson's correlation coefficient. In b and c, the shaded area refers to the 95% confidence band. d) UCSC Genome Browser view of CALB1 and HERVH_8q21.3b. Tracks shown from top to bottom: genomic position, CALB1 transcripts, dbSNP, RepeatMasker annotation, FANTOM5 and GeneHancer (enhancers). Position of HERVH_8q21.3b is highlighted in blue.

References

    1. Cancer Genome Atlas Network Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487(7407):330–337. - PMC - PubMed
    1. Dienstmann R., Vermeulen L., Guinney J., Kopetz S., Tejpar S., Tabernero J. Consensus molecular subtypes and the evolution of precision medicine in colorectal cancer. Nat Rev Cancer. 2017;17(2):79–92. - PubMed
    1. Adam R., De Gramont A., Figueras J., et al. The oncosurgery approach to managing liver metastases from colorectal cancer: a multidisciplinary international consensus. Oncologist. 2012;17(10):1225–1239. - PMC - PubMed
    1. Nordlinger B., Sorbye H., Glimelius B., et al. Perioperative FOLFOX4 chemotherapy and surgery versus surgery alone for resectable liver metastases from colorectal cancer (EORTC 40983): long-term results of a randomised, controlled, phase 3 trial. Lancet Oncol. 2013;14(12):1208–1215. - PubMed
    1. Väyrynen V., Wirta E.V., Seppälä T., et al. Incidence and management of patients with colorectal cancer and synchronous and metachronous colorectal metastases: a population-based study. BJS Open. 2020;4(4):685–692. - PMC - PubMed

Substances