Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;16(1):5887.
doi: 10.1038/s41467-025-60836-9.

Genomic landscape of virus-associated cancers

Affiliations

Genomic landscape of virus-associated cancers

Yoonhee Nam et al. Nat Commun. .

Abstract

It has been estimated that 15%-20% of human cancers are attributable to infections, mostly by carcinogenic viruses. The incidence varies worldwide, with a majority affecting developing countries. Here, we conduct a comparative analysis of virus-positive and virus-negative tumors in nine cancers linked to five viruses. We observe a higher frequency of virus-positive tumors in males, with notable geographic differences in incidence. Our genomic analysis of 1971 tumors reveals a lower somatic burden, distinct mutation signatures, and driver gene mutations in virus-positive tumors. Compared to virus-negative cases, virus-positive cases have fewer mutations of TP53, CDKN2A, and deletions of 9p21.3/CDKN2A-CDKN1A while exhibiting more mutations in RNA helicases DDX3X and EIF4A1. Furthermore, an analysis of clinical trials of PD-(L)1 inhibitors suggests an association of virus-positivity with higher treatment response rate, particularly evident in gastric cancer and head and neck squamous cell carcinoma. Both cancer types also show evidence of increased CD8 + T cell infiltration and T cell receptor clonal selection in virus-positive tumors. These results illustrate the epidemiological, genetic, and therapeutic trends across virus-associated malignancies.

PubMed Disclaimer

Conflict of interest statement

Competing interests: R. Rabadan is the founder of Genotwin, a member of the advisory board of Diatech Pharmacogenetics and Flahy. None of these activities are related to the results in the current manuscript. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Epidemiological trends of virus-associated cancers.
A Incidence ratios of virus-associated and non-virus-associated cancers analyzed using two-sided Fisher’s Exact Test. Data are presented as point estimates (M/F incidence ratios) with error bars indicating 95% confidence intervals. ANKL (92/48), GC (659302/309048), HCC (16091/3744), HL (60913/40219), HNSCC (266342/154358), MCC (521/352), NKTCL (787/370), NPC (29769/12855), PBL (443/148), PCNSL (1814/1286), ALL (3239/2380), BCC (121701/51529), Bladder (487885/125906), CLL (9908/5249), Colorectum (622/348), Esophagus (13067/5857), Gallbladder (47920/74542), Glioblastoma (31071/19801), Kidney (287986/146433), Lung (2861/1416), Melanoma (21469/18141), Pancreas (105/55), Thyroid (207549/613624). B Virus-positive and virus-negative tumors in virus-associated cancers in males compared to females (M/F) reported in selected published studies, analyzed using two-sided Wilcoxon Rank Sum Test. Each point corresponds to an incidence ratio reported in a published study in Supplementary Data 2. ANKL, aggressive NK-cell leukemia; GC, gastric cancer; HCC, hepatocellular carcinoma; HL, Hodgkin lymphoma; HNSCC, head and neck squamous cell carcinoma; MCC, Merkel cell carcinoma; NKTCL, Natural killer/T-cell lymphoma; NPC, nasopharyngeal carcinoma; PBL, plasmablastic lymphoma; PCNSL, primary central nervous system lymphoma; ALL, acute lymphoblastic leukemia; BCC, basal cell carcinoma; CLL, chronic lymphocytic leukemia; BL, Burkitt lymphoma. C, D Estimated incidence rates of EBV-positive HL (C) and EBV-negative HL (D) by country. E, F Estimated incidence rates of EBV-positive NPC (E) and EBV-negative NPC (F) by country. Map data from Natural Earth (https://www.naturalearthdata.com/, public domain), produced by rnaturalearth R package. ASR, age-standardized rate. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Mutation burden of virus-positive and virus-negative tumors in 9 cancers.
A Counts of somatic nonsynonymous mutations in virus-positive and virus-negative tumors in the same cancers. Data are presented as median values with interquartile range (25th–75th percentile). Actual median values for virus-negative and positive samples in the shown cancer types are PCNSL: 6 and 1, cHL (targeted): 10 and 4, cHL (WES): 126.5 and 65, cHL (WGS): 132 and 69, BL: 32 and 46, PBL: 9.5 and 4, GC: 117.5 and 86, CC: 279 and 68, HNSCC: 108 and 62.5, MCC: 22 and 7, HCC: 74 and 90.5. P-values are calculated by a two-sided Wilcoxon rank-sum test. B Log2(fold change) of average number of somatic nonsynonymous mutations (all genes: dark blue, driver genes: light blue) in virus-negative tumors compared to virus-positive tumors. PCNSL (n = 58), CC (n = 172), MCC (n = 71), GC (n = 436), PBL (n = 51), cHL (target, n = 293; WES, n = 69; WGS, n = 24), HNSCC (n = 487), BL (n = 91 for all genes; 68 EBV-positive eBL, 6 EBV-negative eBL, 3 EBV-positive sBL, 14 EBV-negative sBL), BL (n = 120 for driver genes) and HCC (n = 190). GC, gastric cancer; HCC, hepatocellular carcinoma; cHL, classical hodgkin lymphoma; HNSCC, head and neck squamous cell carcinoma; MCC, Merkel cell carcinoma; PBL, plasmablastic lymphoma; PCNSL, primary central nervous system lymphoma; CC, cervical cancer; BL, Burkitt lymphoma. Data are presented as log2(fold change) with error bars indicating 95% confidence intervals. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Mutation signatures in virus-associated cancers.
A, B MCC (n = 71), (C, D) GC (n = 436), (E, F) HNSCC (n = 487). Total mutations (top bar plot) and proportion of mutations associated with each signature (bottom stacked plot) in virus-positive compared to virus-negative cases. Signatures identified from SigProfilerExtractor are shown. Signature names with SBS96, DBS78, or ID83 followed by a capital letter are de novo mutational signatures identified from the cancer cohort, which could not be decomposed into known COSMIC (v.3.4) signatures. The rest are decomposed COMSIC signatures. Schematic representation of the effect of the absence of processes behind key mutation signatures in virus-associated cases is shown in (B) MCC (MCPyV-positive). MCC, Merkel cell carcinoma; GC, gastric cancer; HNSCC, head and neck squamous cell carcinoma. B, D, F Data are presented as median values with interquartile range (25th–75th percentile). P values are calculated by a two-sided Wilcoxon rank-sum test. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Somatic mutations in EIF4A1 and DDX3X, both RNA helicases of the DEAD (Asp-Glu-Ala-Asp) box protein family, are recurrent genetic lesions associated with virus-positive status.
A Combined log10(odds ratio) of mutation in genes associated with virus-positive (top) and virus-negative (bottom) status (q < 0.005) from pooled data of 1971 tumors across 9 virus-associated cancers. Data are presented as log10(odds ratio) values with error bars indicating 95% confidence intervals. The heatmap on the right displays the cancer cohorts included in the pooled data for the calculation of each gene, with colors representing mutation rate trends in each cohort (red: higher in virus-positive; blue: higher in virus-negative) and shades indicating the two-sided Fisher’s exact test p-value. HNSCC, head and neck squamous cell carcinoma; CC, cervical cancer; BL, Burkitt lymphoma; GC, gastric cancer; PBL, plasmablastic lymphoma; cHL, classical Hodgkin lymphoma; PCNSL, primary central nervous system lymphoma; MCC, Merkel cell carcinoma; HCC, hepatocellular carcinoma. B Mutations in DDX3X and EIF4A1 in 2488 tumors. * p < 0.05, two-sided binomial test. C Fraction of patients that are male by DDX3X mutation status. DDDX3X expression by DDX3X mutation status and sex in Burkitt lymphoma (n = 117). * p < 0.05, two-sided MWU test. Data are presented as median values with interquartile range (25th–75th percentile). E Frequencies of mutation of DDX3X and EIF4A1 in virus-positive tumors overall and summary of key biological functions. ANKL, aggressive NK-cell leukemia; NKTCL, Natural killer/T-cell lymphoma; CAEBV, chronic active Epstein-Barr virus disease; ATL, Adult T-cell leukemia/lymphoma. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Analysis of biomarkers for immunotherapy response in virus-associated cancers.
A Odds ratio of positive response to treatment with PD-1/PD-L1 inhibitors with virus-positive status, PD-L1 positive status, and/or high tumor mutation burden (TMB) in 32 studies representing four types of cancer, Fisher’s exact test. Data are presented as odds ratio values with error bars indicating 95% confidence intervals. B Log2(RSEM + 1) expression of PD-L1 (CD274), (C) CIBERSORT CD8 + T cell infiltration score, and (D) TCRβ clonotypes per thousand reads (CPK), versus viral status of tumors in TCGA studies of GC (TCGA-STAD), HCC (TCGA-LIHC), and HNSCC (TCGA-HNSC). GC, gastric cancer; HCC, hepatocellular carcinoma; HNSCC, head and neck squamous cell carcinoma; MCC, Merkel cell carcinoma. P values are calculated by a two-sided Wilcoxon rank-sum test. Data are presented as median values with interquartile range (25th–75th percentile). Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Models of oncogenesis for virus-associated and non-virus-associated cancers.
A Model for oncogenesis in the absence of viral infection. A normal cell accumulates driver mutations as a result of age, defective DNA repair, exogenous carcinogens, or microbiome interactions, leading under selective pressure to initiation, promotion, and progression that ends in the malignant transformation of the cell. B Model for oncogenesis in the presence of viral infection. A normal cell is infected with a virus, and a latent infection is established as a result of inadequate host immune response, potentially associated with germline MHC dysfunction or other inherited risk factors. The infected normal cell acquires somatic mutations in specific genes, such as chromatin modifiers like RNA helicases DDX3X and EIF4A1, leading to initiation, promotion, and progression that ends in the malignant transformation of the infected cell.

Update of

Similar articles

References

    1. van Elsland, D. & Neefjes, J. Bacterial infections and cancer. EMBO Rep.19, e46632 (2018). - PMC - PubMed
    1. Zapatka, M. et al. The landscape of viral associations in human cancers. Nat. Genet.52, 320–330 (2020). - PMC - PubMed
    1. de Martel, C., Georges, D., Bray, F., Ferlay, J. & Clifford, G. M. Global burden of cancer attributable to infections in 2018: a worldwide incidence analysis. Lancet Glob. Health8, e180–e190 (2020). - PubMed
    1. Schrama, D. et al. Merkel cell polyomavirus status is not associated with clinical course of Merkel cell carcinoma. J. Invest. Dermatol.131, 1631–1638 (2011). - PubMed
    1. White, M. K., Pagano, J. S. & Khalili, K. Viruses and human cancers: a long road of discovery of molecular paradigms. Clin. Microbiol. Rev.27, 463–481 (2014). - PMC - PubMed

Substances