Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul;619(7971):793-800.
doi: 10.1038/s41586-023-06266-3. Epub 2023 Jun 28.

Cancer aneuploidies are shaped primarily by effects on tumour fitness

Affiliations

Cancer aneuploidies are shaped primarily by effects on tumour fitness

Juliann Shih et al. Nature. 2023 Jul.

Abstract

Aneuploidies-whole-chromosome or whole-arm imbalances-are the most prevalent alteration in cancer genomes1,2. However, it is still debated whether their prevalence is due to selection or ease of generation as passenger events1,2. Here we developed a method, BISCUT, that identifies loci subject to fitness advantages or disadvantages by interrogating length distributions of telomere- or centromere-bounded copy-number events. These loci were significantly enriched for known cancer driver genes, including genes not detected through analysis of focal copy-number events, and were often lineage specific. BISCUT identified the helicase-encoding gene WRN as a haploinsufficient tumour-suppressor gene on chromosome 8p, which is supported by several lines of evidence. We also formally quantified the role of selection and mechanical biases in driving aneuploidy, finding that rates of arm-level copy-number alterations are most highly correlated with their effects on cellular fitness1,2. These results provide insight into the driving forces behind aneuploidy and its contribution to tumorigenesis.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS

G.F.G., A.C.B, A.D.C., and M.M. receive or received research support from Bayer AG. M.M. and A.M.T. received research support from Ono Pharmaceutical. M.M. is an equity holder, consultant for, and Scientific Advisory Board chair for OrigiMed. M.M. additionally receives research support from Novo Nordisk and Janssen Pharmaceuticals, consults for Interline Therapeutics, and is an inventor of a patent for EGFR mutation diagnosis in lung cancer, licensed to Labcorp. R.B. consults for and owns equity in Scorpion Therapeutics and receives research support from Novartis. J.S., S.S., S.Z., N.Z., Y.G., S.H.H., M.S.C., L.F.S., G.H., V.R., and H.S. declare no competing interests.

Figures

Extended Data Figure 1:
Extended Data Figure 1:. Additional information on different types of SCNA and the BISCUT method.
(a) Empirical examples of low centromeric mechanical bias (1q telomere-bounded deletions, for which the ratio of breakpoints occurring in the centromere over those occurring in the arm is less than 1), and high centromeric mechanical bias (5p telomere-bounded amplifications, for which the centromere/arm breakpoint ratio is much greater than 1). Within the chromosome arm, bins are 1 Mb large. (b) Mean amplification and deletion breakpoint density within chromosome arms, aggregated across all tumors and all chromosome arms (n = 67; binned by Mb), versus breakpoint density within all centromeres (values in breakpoints per megabase). Error bars represent the 95% confidence interval for the mean. C/A Ratio represents centromeric breaks over arm breaks. (c) Comparison of length distributions of telomere-bounded, centromere-bounded, and interstitial amplifications and deletions, aggregated across all chromosome arms. (d) Example depicting BISCUT’s recursion steps. From top to bottom: BISCUT detects peaks iteratively, walking both left and right if a significant peak is detected, with the new boundaries including the detected peak. If a peak is not detected, overlaps with a previous peak, or there are fewer than 4 samples, the analysis is stopped. See Figure 2c and Methods for details.
Extended Data Figure 2:
Extended Data Figure 2:. Pan-cancer BISCUT analysis.
(a) Summary statistics of the four types of BISCUT peaks in pan-cancer. (b) Sizes of peaks (in bases) from the pan-cancer BISCUT analysis. From left to right, peaks are categorized by direction of selection (n = 90 and 103 for positive and negative selection respectively), direction of copy number imbalance (n = 80 and 113 for amplifications and deletions respectively), and origin of partial-SCNA (n = 163 and 30 for telomere-bounded and centromere-bounded respectively). Two-tailed p-value was calculated using a Mann-Whitney U test. (c) Overlap between genes in BISCUT peaks and Tier 1 COSMIC cancer genes. The numbers of peaks containing these genes are depicted in green. A one-tailed p-value was calculated using a permutation test as outlined in the Methods. (d) Negative selection peaks from the pan-cancer BISCUT analysis, sorted from highest to lowest by fraction of samples subject to these fitness effects that also possessed overlapping focal SCNAs in the opposite direction. Peaks that overlap with GISTIC 2.0 peaks are denoted in dark red and dark blue. (e) BISCUT analysis detecting two positive selection peaks (top: 9p telomere-bounded deletions, overlapping with CDKN2A focal deletions; bottom: 8q telomere-bounded amplifications, overlapping with MYC focal amplifications) with focal SCNAs removed (left) and with focal SCNAs included (right). (f) BISCUT analysis detecting two negative selection peaks (top: 8q telomere-bounded deletions, overlapping with MYC focal amplifications; bottom: 11q telomere-bounded deletions, overlapping with YAP1/BIRC3 focal amplifications) with focal SCNAs removed (left) and with focal SCNAs included (right).
Extended Data Figure 3:
Extended Data Figure 3:. Lineage-specific divergence of breakpoint distributions from the background distribution.
Heatmaps of lineage divergence scores for each tumor type (x-axis) and chromosome arm (y-axis). Amplifications are on top (in red) and deletions are on the bottom (in blue). Darker color represents a higher divergence score.
Extended Data Figure 4:
Extended Data Figure 4:. Patterns of chromosome 3p deletions are highly lineage-specific.
(a) Lineage-specificity scores across chromosomes. The left chromatid is shaded in red and represents amplifications, whereas the right chromatid is shaded in blue and represents deletions. Darker colors indicate greater lineage-specificity. (b) BISCUT analysis of telomere-bounded deletions on chromosome 3p in three different cohorts. The top panels display telomere-bounded deletions, sorted by length. The bottom panels show the vertical distance of each tel-SCNA from the background distribution; the maximum deviation is denoted by the solid vertical line. The dashed lines represent the peak regions determined to be under significant positive selection (i.e. conferring survival advantage in this cohort). (c) Genomic locations and corresponding significance score of positive selection deletion BISCUT peaks on chromosome 3p across lineages. See Supplementary Table 2a for tumor type abbreviations.
Extended Data Figure 5:
Extended Data Figure 5:. Hierarchical clustering of BISCUT peaks across lineages.
Matrix of significantly recurring BISCUT peaks across 33 independent tumor types. Peaks are sorted by genomic location (vertical axis), with four distinct classes of peaks in dark red (positive selection in amplifications), light red (negative selection in deletions), light blue (negative selection in amplifications), and dark blue (positive selection in deletions). Tumor types are sorted and color-coded (k = 5) according to hierarchical clustering by Ward’s method (horizontal axis).
Extended Data Figure 6:
Extended Data Figure 6:. Cells engineered with chr8p deletion for validation of genes in BISCUT selection peaks.
(a) Schematic for 8p deletion approach. Cells were transfected with a CRISPR targeting 8p just outside the centromere and with a linearized plasmid containing an artificial telomere, puromycin selection cassette, and 1 kilobase of sequence homologous to the 8p pericentromeric sequence. Puromycin selection was used to isolate cells with 8p replaced by the artificial telomere. (b) ichorCNA output of ultra-low-pass whole genome sequencing data from five AALE cell clones with 8p disomy or 8p monosomy. Horizontal axis is chromosome number, vertical axis is log copy number ratio. Green denotes copy number loss, red denotes copy number gain. (c) Caspase-glo for cells with 8p deletion compared to cells with 8p disomy (n = 3 for both). Each point represents one biological replicate from a representative experiment. One-tailed p-values from two different experiments were combined using Fisher’s method. (d) Flow cytometry analysis of cells with 8p deletion compared to cells with 8p disomy. Bar graphs represent the percentage of apoptotic cells dually stained for Annexin V and PI. One representative experiment is shown. One-tailed p-values from three independent experiments were combined using Fisher’s method. (e) Vertical axis represents normalized read counts from RNA sequencing of cells with 8p disomy or 8p deletion. Each point is an individual clone (n = 8 for all columns). Two-tailed p-values are reported. (f) Relative COSMIC SBS39 mutational signature activity (vertical axis) of engineered cells with 8p disomy versus 8p monosomy. Two-tailed p-values are calculated from a Mann-Whitney U test. (g) WRN qPCR for cell clones with 8p disomy after siRNA treatment. Cells were treated with either control siRNA or siRNA against WRN for 3 days prior to qPCR (n = 2 for each condition). Each point represents the average value across technical replicates in an individual biological replicate. (h) Percentages of apoptotic cells detected by flow cytometry for Annexin V and propidium iodide (PI) across three 8p wild-type cell lines, on day three after transfection with WRN versus control siRNAs. This is representative data from one of four experiments. A ratio paired t-test was used to calculate one-sided p-values for all four experiments, which were combined using Fisher’s method. (i) Log-fold changes in apoptotic cells detected by trypan blue across these three 8p wild-type cell lines (n = 3 for all cell lines), on day three after transfection with WRN vs control siRNAs. Each point represents a different experiment. One-tailed p-values from all experiments were combined using Fisher’s method. (j) WRN qPCR in 8p disomic cell clones with overexpression of WRN or GFP (n = 3 for both). A two-tailed p-value is reported. (k) Cell viability is significantly lower when genes in del-neg peaks are knocked down by RNAi (left, DEMETER2 score) or knocked out by CRISPR (right, Chronos score) in Dependency Map screens,, compared to all other genes. The reported p-value is two-tailed. Box plots center on median values and extend to the first and third quartiles; the whiskers extend to 1.5 times the interquartile range. (l) KAT6A qPCR for cell clones three days after siRNA-mediated knockdown (n = 3 for both conditions). A two-tailed p-value is reported. (m) EPN2 qPCR for cell clones three days after siRNA-mediated knockdown (n = 3 for both conditions). The reported p-value is two-tailed. All p-values in this figure were calculated using Student’s t-test except as otherwise noted; no adjustments were made for multiple comparisons.
Extended Data Figure 7:
Extended Data Figure 7:. Quantitative assessment of selective and mechanical pressures driving aneuploidy.
(a) Calculation of peak-specific relative fitness (RF), arm-level RF, telomeric mechanical coefficients, and chromosome-level centromeric mechanical coefficients. See Methods for further details. (b) Centromeric mechanical coefficients (log) plotted against centromere length (in bases). (c) Centromeric mechanical coefficients (log) plotted against total frequency of arm-SCNAs affecting a specific chromosome (i.e. amplifications and deletions of the p and q arms in aggregate). Acrocentric chromosomes are excluded from analysis. (d) From the original BISCUT analysis: telomeric mechanical coefficients (log) plotted against telomere length, in RTLU, for amplifications (left; in red) and deletions (right; in blue). (e) From the original BISCUT analysis: telomeric mechanical coefficients (log) plotted against frequency of arm-level amplifications (left; in red) and deletions (right; in blue). For all panels, two-tailed p-values and rho correlation coefficients were calculated using Spearman’s rank correlation. No adjustments were made for multiple comparisons.
Extended Data Figure 8:
Extended Data Figure 8:. Telomeric mechanical pressures are better reflected when using baseline ploidies of 2 or 4.
(a) Relative fitness (log) plotted against frequency of arm-SCNAs. From left to right: positive selection in amplifications (dark red), negative selection in amplifications (light blue), positive selection in deletions (dark blue), and negative selection in deletions (light red). (b) Missegregation probability in percentage (determined by single-cell sequencing of RPE1-hTERT non-transformed cells) plotted against frequency of all arm-SCNAs affecting each chromosome, averaged across arms. The horizontal black line at 4.3% reflects the expected random chance of missegregation of each chromosome. (c) Relative fitness (log) plotted against arm length. (d) Chromosome arm length plotted against frequency of arm-SCNAs. (e) Strength of correlation (β; vertical axis) between various coefficients (horizontal axis) and arm-SCNA rates from a multivariate Generalized Linear Model (GLM), with p-values above each predictor (significant values in bold). Amplifications are in red, and deletions are in blue. All p-values in this figure were calculated using Spearman’s correlation except as otherwise noted; no adjustments were made for multiple comparisons.
Figure 1:
Figure 1:. Prevalence and characteristics of different types of SCNAs.
(a)Fraction of TCGA tumors exhibiting frequent somatic genetic alterations. Asterisks indicate arms-SCNAs without known drivers. (b) Cumulative fraction of cancer genomes affected by SCNAs (y-axis), plotted inversely by size of SCNAs (x-axis). The green region represents the fraction of genome covered by arm-SCNAs, and the yellow region represents the fraction of genome additionally covered by focal SCNAs. (c) Classes of SCNAs referenced throughout this manuscript. DNA that has undergone copy-number change is colored green. (d) Schematic representation of centromeric mechanical bias. The line underneath the chromosome arm (x±z) represents the number of breakpoints per Megabase (Mb) within the chromosome arm (dashed lines are the 95% confidence interval for the mean), and the line under the centromere (y) represents the breakpoints per Mb within the centromere. The quotient of y / x represents the centromere to arm breakpoint ratio (C/A Ratio). (e) Mean breakpoint density within chromosome arms, aggregated across all tumors and all chromosome arms (n = 67; binned by Mb), versus breakpoint density within all centromeres (values in breakpoints per megabase). Error bars represent the 95% confidence interval for the mean. C/A Ratio represents centromeric breaks over arm breaks. (f) Total number of breakpoints occurring in the centromere that cause SCNAs plotted against centromere length, which includes pericentromeric regions that lack coverage in the SNP arrays. Two-tailed p-value was calculated using Spearman’s correlation. (g) Amplitude distributions and mean log2 copy number of arm-level, partial, and interstitial SCNAs. Amplitudes are calculated as the absolute value of a weighted average of the amplitudes of segments included in the SCNA (see Methods for details). Curves are scaled according to the total number of SCNAs within each category, to a maximum of 1. (h) Comparison of length distributions of telomere-bounded, centromere-bounded, and interstitial SCNAs, aggregated across all chromosome arms.
Figure 2:
Figure 2:. BISCUT identifies known and novel cancer driver genes through analysis of SCNA length distributions.
(a) Daaifferent patterns of SCNA-mediated selection. (b) Empirical examples of SCNA-mediated selection from the pan-cancer dataset. (c) BISCUT’s peak-finding function. Tumors (dark green) are ranked along the y-axis by partial-SCNA length. The location at which the empirical data deviates maximally from the background distribution is determined (purple). A peak region encompassing this location (denoted by dashed lines) is calculated; see Methods. (d) Statistically significant peaks conferring selection as determined by BISCUT are plotted along the genome. The vertical axis indicates the Significance Score, representing KS-statistic * -log10(q-value). Positive selection peaks are in dark red (amplifications) and blue (deletions), and negative selection peaks are in light red (deletions) and blue (amplifications). Genes found in Tier 1 of the COSMIC Cancer Gene Census are in bold.
Figure 3:
Figure 3:. Validation of genes identified by BISCUT for negative and positive selection.
(a) Ratios of dN/dS scores (ratio of non-synonymous to synonymous mutations) between genes in restricted BISCUT del-neg (light red) or del-pos peaks (dark blue), compared to 1000 randomly selected sets of other genes. Comparisons between different types of single nucleotide variants (SNV) are indicated on the horizontal axis. Two-tailed p-values are derived from comparisons between observed and permuted data. Box plots center on median values and extend to the first and third quartiles; the whiskers extend to 1.5 times the interquartile range. (b) Fraction of TCGA tumors with microsatellite instability (MSI) for different WRN or 8p copy-number status. Two-tailed p-values were calculated using Fisher’s exact test. (c) Relative COSMIC mutational signature activity of WRN copy-neutral versus WRN deleted TCGA tumors. Four statistically significant comparisons are shown, as determined by Mann-Whitney U test and a false discovery rate (q-value) cut-off of 0.2. (d) Log-fold changes in apoptotic cells detected by flow cytometry for Annexin V and propidium iodide (PI) across three 8p wild-type cell lines (n = 4), on day three after transfection with WRN versus control siRNAs. Each point represents a biological replicate. (e) Cell viability measured using Cell-Titer Glo on day 5 of overexpression of GFP or WRN. (f) Cell viability (Cell-Titer Glo) measured three days after siRNA knockdown of the indicated genes. In d-f, Student’s t-tests were used to calculate one-tailed p-values for each independent experiment, and Fisher’s method was used to combine values across the three experiments. For e and f, one representative experiment is shown from three total.
Figure 4:
Figure 4:. Pan-cancer mechanical coefficients and relative fitness (RF).
(a) Mechanical coefficients for each centromere and relative fitness values for amplifications and deletions of each chromosome arm, both reported as log2 values. Black bars represent centromeric mechanical coefficients. Red and blue horizontal lines represent net relative fitness for amplifications and deletions respectively, and are the sum of the amplitude of positive selection (green arrows, pointing up) and negative selection (purple arrows, pointing down). Relative fitness for both p and q arms are depicted to the left and right of the centromeric mechanical coefficient respectively. (b) Log net relative fitness of deletions are significantly lower in diploid samples (mean = −0.49) than in WGD samples (mean = 0.04). Each dot represents a chromosome arm (n = 39 in each column), and two-tailed p-value is calculated using a paired t-test. (c) Log telomeric mechanical coefficients (averaged between amplifications and deletions) versus telomere length, in RTLU (Relative Telomere Length Units; a ratio of telomere signal to a reference signal within one genome). (d) Log net relative fitness versus frequency of arm-level amplifications (left; in red) and deletions (right; in blue). Values above the dashed line represent net positive selection and values below the dashed line represent net negative selection. (e) Spearman’s correlation coefficients for net relative fitness and arm-SCNA rate across pan-cancer (in green) and unique TCGA tumor types (in black; arranged from largest to smallest). Tumor types in italics have p-values < 0.1. Fisher’s method p-value is calculated from unique TCGA types only. For c-e, p-values were calculated using two-tailed Spearman’s correlation except as otherwise noted; no adjustments were made for multiple comparisons.

References

    1. Weaver BA & Cleveland DW Does aneuploidy cause cancer? Curr. Opin. Cell Biol 18, 658–667 (2006). - PubMed
    1. Taylor AM et al. Genomic and Functional Approaches to Understanding Cancer Aneuploidy. Cancer Cell 33, 676–689 e3 (2018). - PMC - PubMed
    1. Boveri T Concerning the origin of malignant tumours by Theodor Boveri. Translated and annotated by Henry Harris. J. Cell Sci 121 Suppl 1, 1–84 (2008). - PubMed
    1. Holland AJ & Cleveland DW Boveri revisited: chromosomal instability, aneuploidy and tumorigenesis. Nat. Rev. Mol. Cell Biol 10, 478–487 (2009). - PMC - PubMed
    1. Sheltzer JM et al. Single-chromosome Gains Commonly Function as Tumor Suppressors. Cancer Cell 31, 240–255 (2017). - PMC - PubMed

Publication types

Substances