Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Feb 21:2025.02.13.25322219.
doi: 10.1101/2025.02.13.25322219.

Geographic and age-related variations in mutational processes in colorectal cancer

Marcos Díaz-Gay  1   2   3   4 Wellington Dos Santos  5 Sarah Moody  6 Mariya Kazachkova  1   3   7 Ammal Abbasi  1   2   3 Christopher D Steele  1   2   3 Raviteja Vangara  1   2   3 Sergey Senkin  5 Jingwei Wang  6 Stephen Fitzgerald  6 Erik N Bergstrom  1   2   3 Azhar Khandekar  1   2   3   8 Burçak Otlu  1   2   3   9 Behnoush Abedi-Ardekani  5 Ana Carolina de Carvalho  5 Thomas Cattiaux  5 Ricardo Cortez Cardoso Penha  5 Valérie Gaborieau  5 Priscilia Chopard  5 Christine Carreira  10 Saamin Cheema  6 Calli Latimer  6 Jon W Teague  6 Anush Mukeriya  11 David Zaridze  11 Riley Cox  12 Monique Albert  12   13 Larry Phouthavongsy  12 Steven Gallinger  14 Reza Malekzadeh  15 Ahmadreza Niavarani  15 Marko Miladinov  16 Katarina Erić  17 Sasa Milosavljevic  18 Suleeporn Sangrajrang  19 Maria Paula Curado  20 Samuel Aguiar  21 Rui Manuel Reis  22   23 Monise Tadin Reis  24 Luis Gustavo Romagnolo  25 Denise Peixoto Guimarães  26 Ivana Holcatova  27   28 Jaroslav Kalvach  29   30   31   32 Carlos Alberto Vaccaro  33 Tamara Alejandra Piñero  33 Beata Świątkowska  34 Jolanta Lissowska  35 Katarzyna Roszkowska-Purska  36 Antonio Huertas-Salgado  37 Tatsuhiro Shibata  38   39 Satoshi Shiba  39 Surasak Sangkhathat  40   41   42 Taned Chitapanarux  43 Gholamreza Roshandel  44 Patricia Ashton-Prolla  45   46 Daniel C Damin  47 Francine Hehn de Oliveira  48 Laura Humphreys  6 Trevor D Lawley  49 Sandra Perdomo  5 Michael R Stratton  6 Paul Brennan  5 Ludmil B Alexandrov  1   2   3   50
Affiliations

Geographic and age-related variations in mutational processes in colorectal cancer

Marcos Díaz-Gay et al. medRxiv. .

Update in

  • Geographic and age variations in mutational processes in colorectal cancer.
    Díaz-Gay M, Dos Santos W, Moody S, Kazachkova M, Abbasi A, Steele CD, Vangara R, Senkin S, Wang J, Fitzgerald S, Bergstrom EN, Khandekar A, Otlu B, Abedi-Ardekani B, de Carvalho AC, Cattiaux T, Penha RCC, Gaborieau V, Chopard P, Carreira C, Cheema S, Latimer C, Teague JW, Mukeriya A, Zaridze D, Cox R, Albert M, Phouthavongsy L, Gallinger S, Malekzadeh R, Niavarani A, Miladinov M, Erić K, Milosavljevic S, Sangrajrang S, Curado MP, Aguiar S, Reis RM, Reis MT, Romagnolo LG, Guimarães DP, Holcatova I, Kalvach J, Vaccaro CA, Piñero TA, Świątkowska B, Lissowska J, Roszkowska-Purska K, Huertas-Salgado A, Shibata T, Shiba S, Sangkhathat S, Chitapanarux T, Roshandel G, Ashton-Prolla P, Damin DC, de Oliveira FH, Humphreys L, Lawley TD, Perdomo S, Stratton MR, Brennan P, Alexandrov LB. Díaz-Gay M, et al. Nature. 2025 Jul;643(8070):230-240. doi: 10.1038/s41586-025-09025-8. Epub 2025 Apr 23. Nature. 2025. PMID: 40267983 Free PMC article.

Abstract

Colorectal cancer incidence rates vary geographically and have changed over time. Notably, in the past two decades, the incidence of early-onset colorectal cancer, affecting individuals under the age of 50 years, has doubled in many countries. The reasons for this increase are unknown. Here, we investigate whether mutational processes contribute to geographic and age-related differences by examining 981 colorectal cancer genomes from 11 countries. No major differences were found in microsatellite unstable cancers, but variations in mutation burden and signatures were observed in the 802 microsatellite-stable cases. Multiple signatures, most with unknown etiologies, exhibited varying prevalence in Argentina, Brazil, Colombia, Russia, and Thailand, indicating geographically diverse levels of mutagenic exposure. Signatures SBS88 and ID18, caused by the bacteria-produced mutagen colibactin, had higher mutation loads in countries with higher colorectal cancer incidence rates. SBS88 and ID18 were also enriched in early-onset colorectal cancers, being 3.3 times more common in individuals diagnosed before age 40 than in those over 70, and were imprinted early during colorectal cancer development. Colibactin exposure was further linked to APC driver mutations, with ID18 responsible for about 25% of APC driver indels in colibactin-positive cases. This study reveals geographic and age-related variations in colorectal cancer mutational processes, and suggests that early-life mutagenic exposure to colibactin-producing bacteria may contribute to the rising incidence of early-onset colorectal cancer.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS L.B.A. is a co-founder, CSO, scientific advisory member, and consultant for io9, has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. L.B.A. is also a compensated member of the scientific advisory board of Inocras. L.B.A.’s spouse is an employee of Hologic, Inc. E.N.B. is a consultant for io9, has equity, and receives income. A.A. and L.B.A. declare U.S. provisional patent application filed with UCSD with serial number 63/366,392. E.N.B. and L.B.A. declare U.S. provisional patent application filed with UCSD with serial numbers 63/269,033. L.B.A. also declares U.S. provisional applications filed with UCSD with serial numbers: 63/289,601; 63/412,835; as well as an international patent application PCT/US2023/010679. L.B.A. is also an inventor of a US Patent 10,776,718 for source identification by non-negative matrix factorization. M.R.S. is founder, consultant, and stockholder for Quotient Therapeutics. L.B.A., M.D.-G., P.B., S.P., M.R.S., and S. Moody declare a European patent application with application number EP25305077.7. T.D.L. is a co-founder and CSO of Microbiotica. All other authors declare that they have no competing interests.

Figures

Fig. 1.
Fig. 1.. Geographic, clinical, and molecular characterization of the Mutographs colorectal cancer cohort.
a, Geographic distribution of the 981 patients across four continents and 11 countries, with an indication of the total number of cases as well as the percentage of early-onset (eo) cases below 50 years of age. Countries were colored according to their age-standardized incidence rates (ASR) per 100,000 individuals. The designations employed and the presentation of the material in this publication do not imply the expression of any opinion whatsoever on the part of the authors or their institutions concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. b, Tumor subsite distribution of the cohort across the colorectum, with an indication of the total number of cases and the percentage of early-onset cases. Different subsites were colored according to the percentage of early-onset cases. Additional 2 cases had unspecified subsites. c, Scatter plots indicating the distribution of molecular subgroups across the sequenced tumors according to the total number of single base substitutions (SBS) and small insertions and deletions (indels; ID), as well as the percentage of genome aberrated (PGA). Cases for which tumor purity was insufficient to determine an accurate copy number profile or without large copy number alterations (65/981) were excluded from the SBS - PGA panel. d, Box plots indicating the distribution of SBS and ID across early-onset (under 50 years; purple) and late-onset (50 years or older; green) microsatellite stable (MSS) colorectal tumors. Statistically significant differences were evaluated using multivariable linear regression models adjusted by sex, country, tumor subsite, and tumor purity. The line within the box is plotted at the median, while the upper and lower ends indicate the 25th and 75th percentiles. Whiskers show 1.5 × interquartile range, and values outside it are shown as individual data points. e-g, Average mutational profiles of early-onset and late-onset MSS colorectal tumors for SBS (SBS-288 mutational context; e), ID (ID-83 mutational context; f), and copy number alterations (CN-68 mutational context; g).
Fig. 2.
Fig. 2.. Geographic variation of mutational signatures in microsatellite stable colorectal cancers.
a, Dot plot indicating the variation of mutational signature prevalence in specific countries compared to all others. Statistically significant enrichments were evaluated using multivariable logistic regression models adjusted by age of diagnosis, sex, tumor subsite, and tumor purity. Firth’s bias-reduced logistic regressions were used for regression presenting complete or quasi-complete separation. Data points were colored according to the odds ratio (OR) of the enrichment, with their size representing statistical significance. P-values were adjusted for multiple comparisons based on the total number of mutational signatures considered per variant type and the total number of countries assessed, and reported as q-values. Q-values<0.05 were considered statistically significant and marked in red. b-c, Geographic distribution of the ID_J (b) and SBS_F (c) mutational signatures. Countries were colored based on the signature prevalence. d, Volcano plots indicating the association of mutational signature activities with the age-standardized incidence rates. Statistically significant associations were evaluated using multivariable linear regression models adjusted by age of diagnosis, sex, tumor subsite, and tumor purity. P-values were adjusted for multiple comparisons based on the total number of mutational signatures considered per variant type and reported as q-values. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 q-values (dashed red line). e-f, Scatter plots indicating the association of the mutations attributed to the SBS88 and ID18 mutational signatures with the age-standardized incidence rates (ASR) across countries for colorectal cancer (e), and independently for colon and rectal cancers (f). Data points were colored based on signature prevalence, with their size indicating the total number of cases per country. Statistically significant associations were evaluated using the sample-level multivariable linear regression models used in d (e), and similar multivariable linear regression models adjusted by age of diagnosis, sex, and tumor purity (f).
Fig. 3.
Fig. 3.. Variation of mutational signatures with age of onset in microsatellite stable colorectal cancers.
a, Volcano plots indicating the enrichment of mutational signature prevalence in early-onset and late-onset cases. Statistically significant enrichments were evaluated using multivariable logistic regression models for age of onset categorized in two subgroups (early-onset, <50 years of age; and late-onset, ≥50) and adjusted by sex, country, tumor subsite, and tumor purity. Firth’s bias-reduced logistic regressions were used for regression presenting complete or quasi-complete separation. P-values were adjusted for multiple comparisons based on the total number of mutational signatures considered per variant type and reported as q-values. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 q-values (dashed red line). b, Line plots indicating mutational signature prevalence trend across ages of onset, using five different age groups. Signatures significantly enriched in early-onset or late-onset cases (as shown in a) were colored in purple and green, respective, whereas signatures not varying significantly with age were colored in grey. c, Bar plots indicating mutational signature prevalence across age groups, with indication of the total number of cases where signatures were detected. Statistically significant trends were evaluated using multivariable logistic regression models for age categorized in five subgroups (0–39, 40–49, 50–59, 60–69, ≥70) and adjusted by sex, country, tumor subsite, and tumor purity. Firth’s bias-reduced logistic regressions were used for regressions presenting complete or quasi-complete separation. d-e, Box plots indicating the variation in age of onset according to the presence of colibactin mutational signatures (either SBS88, ID18, or both) in all microsatellite stable cases (d) and across tumor subsites (e). Statistically significant differences were evaluated using multivariable linear regression models adjusted by sex, country, tumor purity, and tumor subsite (only for the analysis of all cases, d). The line within the box is plotted at the median, while the upper and lower ends indicate the 25th and 75th percentiles. Whiskers show 1.5 × interquartile range, and values outside it are shown as individual data points.
Fig. 4.
Fig. 4.. Colibactin mutagenesis as an early event in microsatellite stable colorectal cancer evolution.
a, Box plots indicating the fold-change of the relative contribution per sample of each signature between early clonal and late clonal single base substitutions (SBS, left) and small insertions and deletions (ID, right). SBS signatures that generated early clonal SBSs in fewer than 50 samples and also generated late clonal SBSs in fewer than 50 samples were excluded from the analysis. Similarly, ID signatures that generated early clonal IDs in fewer than 20 samples and late clonal IDs in fewer than 20 samples were also excluded. Signatures were sorted by their median fold-change. The line within the box is plotted at the median, while the upper and lower ends indicate the 25th and 75th percentiles. Whiskers show 1.5 × interquartile range, and values outside it are shown as individual data points. b, Bar plot indicating the lack of concordance between colibactin exposure status determined by the presence of colibactin-induced mutational signatures SBS88 or ID18, and the microbiome pks status. Statistical significance was evaluated using a multivariable Firth’s bias-reduced logistic regression model (due to quasi-complete separation) adjusted by age of diagnosis, sex, country, tumor subsite, and tumor purity. c-d, Distribution of the age of onset (c) and cases across age groups (d) based on the detection of colibactin-positive samples using genomic and microbiome status. The genomic status is defined by the presence of SBS88 or ID18 signatures, while the microbiome status (pks) is determined by coverage of at least half of the pks island, and suggests ongoing or active pks+ bacterial infection. Statistical significance was evaluated using a multivariable linear regression model adjusted by sex, country, tumor subsite, and tumor purity.
Fig. 5.
Fig. 5.. Variation of driver mutations with age of onset and association with colibactin mutagenesis in microsatellite stable colorectal cancers.
a, Bar plot indicating the prevalence of driver mutations affecting the 48 bioinformatically detected driver genes in microsatellite stable colorectal cancers. Genes were colored according to their status as known cancer driver genes for colorectal cancer, known cancer driver genes for other cancer types, or newly detected cancer driver genes. b, Box plots indicating the distribution of total driver mutations across early-onset (under 50 years of age; purple) and late-onset (50 or over; green) tumors. Statistical significance was evaluated using a multivariable linear regression model adjusted by sex, country, tumor subsite, and tumor purity. The line within the box is plotted at the median, while the upper and lower ends indicate the 25th and 75th percentiles. Whiskers show 1.5 × interquartile range, and values outside it are shown as individual data points. c, Volcano plot indicating the enrichment of driver mutations in cancer driver genes in early-onset and late-onset cases. Statistically significant enrichments were evaluated using multivariable logistic regression models adjusted by sex, country, tumor subsite, and tumor purity. Firth’s bias-reduced logistic regressions were used for regressions presenting complete or quasi-complete separation. P-values were adjusted for multiple comparisons based on the total number of cancer driver genes considered and reported as q-values. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 q-values (dashed red line). d, Line plot indicating the prevalence of driver mutations in cancer driver genes across ages of onset, using five different age groups. Cancer driver genes significantly enriched in late-onset cases (as shown in c) were colored in green, whereas genes not varying significantly with age of onset were colored in grey. e, Bar plots indicating the proportion and number of driver mutations probabilistically assigned to colibactin-induced and other mutational signatures, including single base substitutions (e) and small insertions and deletions (indels; f). Driver mutations were divided into different groups, including the APC c.835–8A>G splicing-associated driver mutation, other APC driver mutations, TP53 driver mutations, and driver mutations affecting other cancer driver genes.

References

    1. Bray F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 74, 229–263 (2024). 10.3322/caac.21834 - DOI - PubMed
    1. Brennan P. & Davey-Smith G. Identifying Novel Causes of Cancers to Enhance Cancer Prevention: New Strategies Are Needed. J Natl Cancer Inst 114, 353–360 (2022). 10.1093/jnci/djab204 - DOI - PMC - PubMed
    1. Kucab J. E. et al. A Compendium of Mutational Signatures of Environmental Agents. Cell 177, 821–836 e816 (2019). 10.1016/j.cell.2019.03.001 - DOI - PMC - PubMed
    1. Ames B. N., Durston W. E., Yamasaki E. & Lee F. D. Carcinogens are mutagens: a simple test system combining liver homogenates for activation and bacteria for detection. Proc Natl Acad Sci U S A 70, 2281–2285 (1973). 10.1073/pnas.70.8.2281 - DOI - PMC - PubMed
    1. Senkin S. et al. Geographic variation of mutagenic exposures in kidney cancer genomes. Nature (2024). 10.1038/s41586-024-07368-2 - DOI - PMC - PubMed

Publication types

LinkOut - more resources