Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb;578(7793):129-136.
doi: 10.1038/s41586-020-1970-0. Epub 2020 Feb 5.

Genomic basis for RNA alterations in cancer

Collaborators, Affiliations

Genomic basis for RNA alterations in cancer

PCAWG Transcriptome Core Group et al. Nature. 2020 Feb.

Erratum in

  • Author Correction: Genomic basis for RNA alterations in cancer.
    PCAWG Transcriptome Core Group; Calabrese C, Davidson NR, Demircioğlu D, Fonseca NA, He Y, Kahles A, Lehmann KV, Liu F, Shiraishi Y, Soulette CM, Urban L, Greger L, Li S, Liu D, Perry MD, Xiang Q, Zhang F, Zhang J, Bailey P, Erkek S, Hoadley KA, Hou Y, Huska MR, Kilpinen H, Korbel JO, Marin MG, Markowski J, Nandi T, Pan-Hammarström Q, Pedamallu CS, Siebert R, Stark SG, Su H, Tan P, Waszak SM, Yung C, Zhu S, Awadalla P, Creighton CJ, Meyerson M, Ouellette BFF, Wu K, Yang H; PCAWG Transcriptome Working Group; Brazma A, Brooks AN, Göke J, Rätsch G, Schwarz RF, Stegle O, Zhang Z; PCAWG Consortium. PCAWG Transcriptome Core Group, et al. Nature. 2023 Feb;614(7948):E37. doi: 10.1038/s41586-022-05596-y. Nature. 2023. PMID: 36697831 Free PMC article. No abstract available.

Abstract

Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.

PubMed Disclaimer

Conflict of interest statement

M.M. is a scientific advisory board chair of, and consultant for, OrigiMed, receives research funding from Bayer and Ono Pharma, and has patent royalties from LabCorp. G.R. is on the scientific advisory board of Computomics GmbH and receives research funding from Roche Diagnostics and Google. R.S. received honorariums for speaking at meeting organized by Roche and AstraZeneca. All the other authors have no competing interests.

Figures

Fig. 1
Fig. 1. Germline and somatic SNVs associated with expression.
a, Epigenetics Roadmap enrichment analysis, showing the average fold change in Roadmap factors across cell lines in PCAWG-specific eQTLs of the pan-analysis as well as eQTLs that replicate in GTEx tissues. *P < 0.05/25, one-sided Wilcoxon rank-sum test in PCAWG-specific eQTLs corrected for the number of Roadmap factors used (that is, 25). Data are mean and s.d. b, Variance component analysis for gene expression levels, showing the average proportion of variance explained by different germline and somatic factors for different sets of genes including the mean effect across all factors: (1) all genetic factors (germline and somatic); (2) SCNAs; (3) somatic variants in flanking regions; (4) population structure; (5) cis-germline effects; and (6) somatic intron and exon mutation effects. c, Manhattan plot showing nominal P values of association for TEKT5 (highlighted in grey), considering flanking, intronic and exonic intervals. The leading somatic burden is associated with increased TEKT5 expression (P = 1.61 × 10−6) and overlaps an upstream bivalent promoter (red dots; annotated in 81 Roadmap cell lines, including 8 embryonic stem cells, 9 embryonic-stem-cell-derived and 5 induced pluripotent stem-cell lines). d, Summary of significant associations between mutational signatures (Sig) and gene expression. Top, the total number of associated genes per signature (FDR ≤ 10%). Bottom, enriched GO categories or Reactome pathways for genes associated with each signature (FDR ≤ 10%, significance level encoded in colour, −log10-transformed adjusted P value). e, Standardized effect sizes on the presence of AEI, taking only SCNAs, germline eQTLs, coding and non-coding mutations into account. Data are the estimate and standard error of the estimate of the effect size.
Fig. 2
Fig. 2. Position-specific effect of somatic mutations on alternative splicing.
a, Top, proportion of mutations near exon–intron junctions and at branch sites that are associated with exon-skipping events. Mutations with associated splicing changes are those in which the percentage spliced in-derived |z-score| is ≥ 3 (dark blue). Asterisks denote intron positions significantly enriched for splicing changes relative to background based on a permutation test. *P < 0.05, **P < 0.01, ***P < 0.001. Bottom, sequence motifs of regions. b, Example of an exonization event in the tumour-suppressor gene STK11. The RNA-seq read coverage for a part of the gene is shown in red for a donor carrying the alternative allele, and in grey for a random donor with reference allele. The cassette exon event is shown as a schematic below. c, Enrichment of SINE elements in SAVs compared to sequence background (BG). Shown for SINE elements overlapping in sense (middle) and antisense (right) directions.
Fig. 3
Fig. 3. Structural rearrangements associated with RNA fusions.
a, The number of all detected and new fusions and their overlap with the cancer census genes. b, Schematic of an example of bridged fusions. Bridged fusions are those composite fusions formed by a third genomic segment that bridges two genes. Only one of the possible orders of genomic arrangement is depicted in each case, with break points highlighted as thunderbolts.
Fig. 4
Fig. 4. Global view of DNA and RNA alterations that affect tumours.
a, The median numbers of different alterations across histotypes. Histotypes are ordered by hierarchical clustering based on the pattern of different types of alteration. Only histotypes with more than 10 donors are shown. Alt., alternative; non-syn, non-synonymous. Cancer-type abbreviations are listed in Supplementary Table 23. b, c, Circular representations of the selected genes significantly co-occurred with B2M (b) and PCBP2 (c). Connecting lines indicate the specific types of co-occurrence of alteration pairs. The inner histograms indicate the frequencies of incidences of different alteration types shown in different colours. d, All 74 Catalogue of Somatic Mutations in Cancer (COSMIC) cancer census genes or PCAWG driver genes that are both frequently and heterogeneously altered across both RNA- and DNA-level alterations. Yellow bars indicate the proportion of samples that had DNA-level alterations, and green bars indicate the proportion of samples with RNA-level alterations. Middle column is the proportion of each alteration type observed for that gene. e, The enrichment of cancer genes within our list of significantly recurrent genes.
Extended Data Fig. 1
Extended Data Fig. 1. Pan-cancer expression profiling of 1,188 PCAWG donors.
a, Tumour and normal RNA-seq data from 27 histotypes. The total number of samples is shown to the right of the bars. Grey bars denote matched healthy samples. b, Number of female versus male donors. c, Total number of tumour and matched healthy samples from the PCAWG study. A subset of tumours (dark violet) was metastatic.
Extended Data Fig. 2
Extended Data Fig. 2. Overview of the different sources of genetic variation considered in the analysis.
a, For analyses of cis regulation, mono-allelic single-nucleotide germline variants (single nucleotide polymorphisms (SNPs), blue) were individually tested for association with total gene expression using standard eQTL approaches. Owing to their low recurrence in the cohort, somatic SNVs were aggregated in burden categories depending on their position relative to the gene tested (for example, promoter, 5′ UTR or intron). Local SNV burdens were then tested for association with ASE globally across all genes, as well as with total expression on a per-gene level using eQTL approaches. Trans effects were estimated by testing total gene expression for association with mutational and epigenetic signatures. Window sizes were 1 Mb for all somatic cis-eQTL analyses, and 100 kb for ASE and germline cis-eQTL. b, Overview of the different datasets and their contributions to the analyses described in a. Germline genotypes were derived from the matched healthy whole-genome sequencing (WGS) samples. Allele-specific SCNAs, mutational signatures and local SNV burdens were derived from the tumour WGS in comparison to the unaffected WGS samples. ASE and total expression (FPKM) were derived from the tumour and normal RNA-seq data. Arrows indicate dependencies between individual analyses carried out.
Extended Data Fig. 3
Extended Data Fig. 3. Germline eQTL lead variants.
Left, quantile–quantile (Q–Q) plot of P values of germline eQTL lead variants in the pan-cancer and histotype-specific analysis (FDR ≤ 5%, blue) and P values of the same analysis after permutation (random permutation of patients, red). Middle and right, distributions of distance to the respective TSS of all germline eQTL lead variants in the pan-cancer and histotype-specific analysis.
Extended Data Fig. 4
Extended Data Fig. 4. PCAWG-specific eGenes.
a, Number of PCAWG-specific eGenes in relation to eQTL replication in various numbers of GTEx tissues. b, Number of eGenes of the PCAWG pan-analysis replicating in corresponding GTEx tissues.
Extended Data Fig. 5
Extended Data Fig. 5. Cis-mutational somatic burden.
a, Total number of somatic mutational load per cancer type. Median numbers of SNVs range from 1,139 in thyroid adenocarcinoma to 72,804 in skin melanoma. b, Number of recurrent somatic SNVs shared by increasing numbers of patients. A small fraction of 86 SNVs is detected in more than 1% of the cohort (12 patients).
Extended Data Fig. 6
Extended Data Fig. 6. Somatic mutation rate and burden frequency by type of region tested.
a, Number of mutated regions tested per gene with somatic burden frequency ≥ 1%. b, Mutation rate per kilobase. c, Burden frequency, stratified by the type of interval tested (flanking, exonic or intronic). d, Distribution of distances (bp) of the leading intervals (FDR ≤ 5%) to the closest (left and right) interval such that the association P value decreases by at least one order of magnitude (99% of the distribution is shown). e, Breakdown of all genomic regions tested (n = 1,049,102 with burden frequency ≥ 1%) and of the 567 genomic regions that underlie the observed somatic cis-eQTL at a FDR of 5% (intronic denotes eGene intron; exonic denotes eGene exon; flank. denotes 2-kb flanking region within 1 Mb distance to the eGene start and end; flank.intergenic denotes flanking region in a genomic location without gene annotations; flank.intronic denotes flanking region overlapping an intron of a nearby gene; and flank.others denotes flanking region partially overlapping several annotations of a nearby gene).
Extended Data Fig. 7
Extended Data Fig. 7. Manhattan plots of seven somatic eGenes associated with genic lead burden.
Altogether, 11 genic somatic eQTLs showed significant changes in gene expression associated with somatic burdens within the gene boundaries (intronic or exonic). The seven genes shown here are known to be important in the pathogenesis of specific cancers. a, CDK12. b, PI4KA. c, IRF4. d, AICDA. e, C11orf73 (also known as HIKESHI). f, BCL2. g, SGK1.
Extended Data Fig. 8
Extended Data Fig. 8. Scatter plots of eight somatic eGenes.
Plots show the effect of the lead weighted burden on the gene expression residuals (obtained as described in the Methods) of these genes. a, CDK12. b, PI4KA. c, IRF4. d, AICDA. e, C11orf73. f, BCL2. g, SGK1. h, TEKT5.
Extended Data Fig. 9
Extended Data Fig. 9. Roadmap epigenome marks overlapping flanking intervals with somatic burden.
a, Maximum fold enrichment of epigenetic marks from the Roadmap Epigenomics Project across 127 cell lines. The number of cell lines with significant enrichments is indicated in parentheses (FDR ≤ 10%); asterisks denote significant enrichments in at least one cell line. b, Mean percentages (over the 127 cell lines) of regions overlapping (by at least 10% of their length) Roadmap epigenome marks, calculated using all genomic flanking regions (n = 1,637,638) and the subset of 556 flanking intervals associated with somatic eQTL (FDR ≤ 5%). c, Mutation rate per kilobase. d, Burden frequency (across the 127 cell lines) of the 556 flanking intervals in somatic eQTLs (FDR ≤ 5%), overlapping 25 Roadmap epigenome marks. DNase, DNase only; EnhA, active enhancer; EnhAc, enhancer acetylation only; EnhAF, active enhancer flank; EnhW, weak enhancer; Het, heterochromatin; PromBiv, bivalent promoters; PromD, promoter downstream; PromP, poised promoters; PromU, promoter upstream; Quies, quiescent/low; ReprPC, repressed PolyComb; TssA, active TSS; TxReg, transcription regulatory; ZNF/Rpts, ZNF genes and repeats; Tx, transcription; Tx3, transcription 3′, Tx5, transcription 5′; TxEnh3, transcription 3′ enhancer; TxEnh5, transcription 5′ enhancer; TxEnhW, transcription weak enhancer; TxWk, weak transcription.
Extended Data Fig. 10
Extended Data Fig. 10. Quality control of the association studies between gene expression and mutational signatures.
ac, Q–Q plots of the P values of the linear model to associate expression of 18,831 genes with 28 mutational signatures across all 1,159 patients (a), 877 patients with carcinoma (b), or 891 European patients (c). d, Number of significant associations (log10-transformed) at different FDR thresholds (across all patients, patients with carcinoma and European patients). e, Volcano plot of directionality of effects in the analysis of all patients. f, g, Comparison of analyses between all patients and patients with carcinoma (f) and between all patients and European patients (g). The −log10(P values) per signature–gene pair are correlated (r = 0.763 (f) and r = 0.789 (g), Pearson correlation coefficient), especially above an FDR threshold of 10%.
Extended Data Fig. 11
Extended Data Fig. 11. Relationship between mutational signatures and gene expression patterns.
a, b, Principal component analysis (PCA) of signatures across 1,159 patients (PCA on signature-specific SNVs per patient) (a) and signature–gene expression associations across 18,831 genes (PCA on adjusted P values of signature–gene expression associations) (b). The PCA on the SNVs recapitulates known interdependencies, for example, between signatures 7, whereas the PCA on the signature–gene association studies also emphasizes functional relatedness, for example, between signatures 2 and 13. c, Hierarchical clustering of signatures. The numbers at the nodes indicate the number of genes commonly associated with two to four respective signatures. The dendrogram shows genes that are associated with more than one signature mostly owing to similar SNV patterns of these signatures across patients. d, Frequency of number of significantly associated genes per signature (FDR ≤ 10%). Although many signatures are significantly associated with a few genes, 18 signatures are associated with more than 20 genes. Signature 9 is associated with more than 350 genes. Vice versa, 1,009 genes are associated with only one signature, 129 with two, 32 with three, 5 with four and 1 with five signatures. e, f, Mutational signature–gene associations, depicting positive associations between the expression of the canonical APOBEC pathway genes APOBEC3B (e) and APOBEC3A (f) and signature 2. The associations within the three cancer type with the strongest correlation between signature and gene expression (hepatocellular carcinoma (Liver–HCC), bone leiomyosarcoma (Bone–Leiomyo) and prostate adenocarcinoma (Prost–AdenoCA)) are shown.
Extended Data Fig. 12
Extended Data Fig. 12. ASE analysis.
a, All types of cancer are ordered by the average AEI frequency. The numbers of genes per patient for which ASE could be quantified are shown, stratified according to cancer type, resulting in between 588 and 7,728 genes per patient. b, Distribution of the fraction of genes with AEI (red) and SCNAs (blue) over the number of measurable genes for each patient across the cohort. Cancer types with high chromosomal instability also exhibit highest amounts of AEI.
Extended Data Fig. 13
Extended Data Fig. 13. SCNAs as major driver for allelic dysregulation in cancer.
a, Absolute allelic expression imbalance closely follows allelic imbalance at the genomic level. Values of 0.5 (blue) denote equal number of reads from both alleles. Values of 1 (yellow) reflect mono-allelic expression or regions with loss of heterozygosity. b, Comparison between B-allele frequency (BAF) and ASE ratios from a single patient with lung cancer (LUAD-US) with profound chromosomal instability shows strong correlation between allelic imbalance on expression and genomic levels.
Extended Data Fig. 14
Extended Data Fig. 14. Determinants of AEI.
a, Standardized effect sizes on the presence of AEI, taking only SCNAs, germline eQTLs, coding and non-coding mutations into account. In summary, SCNAs accounted for 86.1% of the total effect size, followed by germline eQTLs (9.0%) and somatic SNVs (4.8%). b, Relevance of individual somatic mutation types (‘copy-number ht1’ and ‘copy-number ht2’ as local allele-specific SCNAs of haplotypes 1 and 2, respectively), germline eQTLs and other covariates for the ASE ratio. Significant covariates (FDR ≤ 5%) are highlighted in bold. c, Comparison of the effect of protein-truncating variants (stop-gained) and synonymous variants on the ASE ratio.
Extended Data Fig. 15
Extended Data Fig. 15. Overview of estimations of promoter activity and non-coding promoter mutations associations and patterns.
a, b, The technical variation of the promoter activity estimates across varying library depth (a) and positional bias (b). c, The number of outlier promoters per tumour type according to promoter activity variance (variance larger than 1.5 × the interquartile range). d, Distribution of promoter mutations around promoters across the PCAWG cohort for major, minor and inactive promoters. Red lines indicate the window 200-bp upstream of a TSS, in which major promoters show an enrichment of mutations whereas minor and inactive promoters do not. e, Distribution of promoter mutations around promoters for the top two most mutated types of cancer (skin melanoma and colorectal adenocarcinoma (ColoRect–AdenoCA)). Colorectal adenocarcinoma displays a very different mutational pattern from other types of cancer. f, Distribution of promoter mutations around major, minor and inactive promoters across several types of cancer. Red lines indicate the window 200-bp upstream of a TSS, in which major promoters show an enrichment of mutations whereas minor and inactive promoters do not. g, Schematic of the calculation of non-coding promoter mutational burden. h, Overview of non-coding promoter mutations per sample and the number of mutated promoters per tumour type for promoters with at least three mutated samples. i, j, Association of absolute (i) and relative (j) promoter activity with promoter mutations across all samples. k, l, Overview of promoter mutations for skin melanoma tumours. k, Most promoter mutations are C>T, which indicates UV-induced DNA damage. l, Distribution of promoter mutations for each mutation class reveals the enrichment of C>T mutations around the 200-bp window upstream. m, n, Overview of promoter mutations for colorectal adenocarcinoma tumours. m, Most promoter mutations are C>A and C>T. n, Distribution of promoter mutations for each mutation class does not display an enrichment of mutations around the 200-bp window upstream, differing from the mutation pattern of skin melanoma tumours.
Extended Data Fig. 16
Extended Data Fig. 16. TERT promoter mutations.
a, Promoters ranked by the number of mutated samples across all types of cancer in a 200-bp window. Asterisk indicates cancer census genes. b, The TERT locus and number of mutations observed at each position. The first promoter shows a highly recurrent non-coding mutation reported previously,. c, Comparison of TERT promoter activity for mutated and non-mutated samples per tumour type.
Extended Data Fig. 17
Extended Data Fig. 17. Alternative splicing and association with somatic mutations.
a, Number of exon-skipping events confirmed at different ΔPSI thresholds in tumour (red), matched healthy (green) and GTEx (blue) samples for liver tissue. Dashed lines show the subset of exon-skipping events that only contain annotated introns. b, Number of exon-skipping events confirmed at a ΔPSI level of greater than 0.3 for the individual histotypes. Transparent section of bars represents the fraction of novel events, containing at least one unannotated intron. c, Splicing landscape for exon-skipping events. t-SNE analysis based on exon-skipping PSI values for all ICGC tumour and healthy samples together with tissue-matched GTEx samples. d, Position-specific effect of somatic mutations on alternative splicing. Magnitude and direction of mutation-associated splicing alterations. e, Permutation-based FDR values for SAV detection based on the different types of cancer. f, Cancer gene set enrichment for SAV sets, shown for cancer census gene set (middle) and sets determined in ref. (left) and ref. (right). g, Positional distributions (logarithms of distance from the nearest exons) of somatic variant creating novel splicing donors and acceptors. h, Sequence motif logos around somatic mutation creating novel splicing motifs. i, Example splicing effect of a branch-point mutation. UCSC genome browser RNA-seq coverage plots of cassette exon event in RBM28 between mutant and wild type. Mutant (bottom track) contains an A>G mutation 29 nucleotides upstream from the acceptor site of an affected exon. j, Distribution of new cassette exon events detected only within the PCAWG cohort. Top, number of events per histology type. Middle, events normalized to the total number of cassette exons detected in the histology types. Bottom, the number of exonization events per histotype for the subset with the novel cassette exons colocated to a somatic alteration near the acceptor or donor of the exon. k, Example of an exonization event in the tumour-suppressor gene STK11. RNA-seq read coverage for a part of the gene is shown in red for a donor carrying the alternate allele and in grey for a random donor with reference allele. The cassette exon event is shown as a schematic below, with blue (red) boxes denoting constitutive (alternative) exons and blue solid lines denoting introns. Magnified panels at the bottom show details from Integrative Genomics Viewer visualization, highlighting a somatic mutation at the 3′ end of the cassette exon. The associated sequencing change is illustrated on the bottom right corner, in which the vertical bar denotes the exon–intron boundary. l, Alu-based exonization mechanism. Top, the presence of an Alu element in an intron in antisense alone will still result in normal splicing. Bottom, specific mutations of the Alu sequence creates new splice sites and results in exonization.
Extended Data Fig. 18
Extended Data Fig. 18. Recurrent and promiscuous RNA fusions.
a, Features of the 27 most recurrent in-frame or open-reading-frame-retaining fusions. Kinase column indicates whether one of the gene partners is a kinase gene b, Network with connected clusters of at least 10 genes. Genes are represented as nodes, and the size of a node is proportional to the number of gene-fusion partners. Two nodes are connected if one fusion was detected involving the two genes: an edge is coloured blue if the fusion has evidence for matched structural rearrangements and is coloured red otherwise. Nodes and connections are shown only between promiscuous genes. The colour intensity indicates whether a gene is involved more often in a fusion as a 3′ (purple) or 5′ (green) gene or both (white).
Extended Data Fig. 19
Extended Data Fig. 19. Structural rearrangements associated with RNA fusions.
a, Systematic classification scheme of all gene fusions based on underlying structural variants (SVs). Numbers of fusion events of different classes are shown to the right. b, Schematic of examples of different types of structural-variant-supported fusions: (1) direct fusions; (2) intercomposite fusions; and (3) intracomposite fusions. Bridged fusions are shown in Fig. 3b. Only one of the possible orders of genomic arrangement is depicted in each case, with break points highlighted by thunderbolts. c, Supported rearrangements for composite fusions bring the fused segments of two genes significantly closer. Natural distance indicates the native distance between two related structural variant break points. Effective distance indicates the distance between the final two break points of the intra- and intercomposite fusions. d, The break points of structural-variant-independent fusions are typically closer than those for other interchromosomal fusions, which indicates that at least some of the structural-variant-independent fusions may occur directly at the RNA level, mediated either by trans-splicing or read-through events.
Extended Data Fig. 20
Extended Data Fig. 20. Correlation of the number of somatic genomic alterations with RNA alterations.
Scatter plots of log10-transformed frequency of DNA alterations versus log10-transformed frequency of RNA alterations, in which each row is a DNA alteration in the following order: structural variants, copy-number aberrations and non-synonymous variants. Each row is an RNA alteration in the following order: expression outliers, RNA editing, ASE, fusions and splicing. Each point is a sample coloured by histotype, and its position is the log-transformed number of aberrations found in each sample. The Benjamini–Hochberg-adjusted P values are calculated from a likelihood ratio test assuming negative binomial distribution; histotype is used as a confounder.
Extended Data Fig. 21
Extended Data Fig. 21. Global view of DNA and RNA alterations affecting cancer pathways.
Composite pie charts showing the percentages of RNA alterations, DNA alterations or both, affecting sets of genes in well-characterized cancer pathways and known to be functionally altered in cancer. The sizes of circles represent the percentages of patients affected based on the given gene set. The columns indicate different types of cancer. The numbers in parenthesis indicate the number of genes analysed for the specific pathway.
Extended Data Fig. 22
Extended Data Fig. 22. Breakdown of DNA and RNA alterations of cancer genes.
a, Composite pie charts showing percentages of DNA and RNA alterations for top cancer-driver genes. The 20 most significant cancer-driver genes identified by the PCAWG group in pan-cancer level are depicted, with the sizes of the pie charts indicating the percentages of patients carrying alterations in the given driver gene. The areas represent the relative percentages of patients exhibiting different alterations depicted by corresponding colours. When several types of alteration in one pathway affect the same patient, only a fraction is counted towards each type of alteration. b, Proportional bar plots showing the distribution of gene alterations for genes in the TP53 and TGFB pathways.
Extended Data Fig. 23
Extended Data Fig. 23. Trans-associations found by co-occurrence analyses.
a, Scatter plot for association of gene expression outliers with cancer gene variants. Each dot represents an alteration pair. The x axis shows all COSMIC genes ordered alphabetically and the y axis represents the FDR-adjusted P values (q values) based on Fisher’s exact tests. COSMIC genes with more than five significant associations (FDR < 5%) are coloured in red and labelled. b, Heat map showing the extent of associations between COSMIC gene somatic mutations and expression outliers of all genes. Each row indicates one gene, and the colour intensity shows the significance of trans-association. COSMIC genes labelled to the right are ordered by the number of significant associations. Only the top 10 genes are shown. c, Enrichment map showing the significant (FDR ≤ 0.01) pathways based on the top 100 significant genes associated with B2M alterations. Colour intensity represents enrichment significance, node sizes the number of analysed genes belonging to the given pathway and edge sizes the degree of overlap between two gene sets. Only the top 10 enriched terms are shown.
Extended Data Fig. 24
Extended Data Fig. 24. Genes can be altered in cis by several mechanisms.
a, Genes with at least one type of RNA alteration that also has an associated change at the DNA-level in cis. Genes are either classified as a PCAWG driver gene or not classified as a driver gene or a cancer gene from the cancer gene census. b, c, Examples of a known cancer gene, NF1 (b), and an unclassified gene, PTGFRN (c), having heterogeneous mechanisms of alterations.
Extended Data Fig. 25
Extended Data Fig. 25. Proportion of genes with DNA or RNA alterations.
a, Full list of 731 genes that are both frequently and heterogeneously altered across both RNA- and DNA-level alterations. Yellow bars to the left indicate the proportion of samples that had DNA-level alterations, whereas green bars to the right indicate the proportion of samples with RNA-level alterations. Middle column is a heat map corresponding to the −log10(P value). Asterisks indicate a COSMIC Cancer Gene Census (CGC) gene or PCAWG driver genes. b, Distribution of alteration types among all significant genes or just CGC or PCAWG driver genes.
Extended Data Fig. 26
Extended Data Fig. 26. Outlier events in CDK12.
a, Fusion, splicing and alternative promoter outlier events of the RNA alterations that lead to either partial or full removal of the kinase domain in CDK12. b, All outlier events in CDK12, including those not contained directly within the kinase domain, across all 1,188 samples. Each column is a sample and each row is the alteration type. Although not directly searching for mutually exclusive events across all genes, we find that CDK12 is marginally mutually exclusive in RNA editing, splicing outliers, alternative promoters, non-synonymous variants and fusions (4.810−3, unweighted WExT). c, All alteration events that occur within CDK12 across all 1,188 samples, which is not mutually exclusive.

Comment in

Similar articles

Cited by

References

    1. Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014). - PMC - PubMed
    1. Owens, M. A., Horten, B. C. & Da Silva, M. M. HER2 amplification ratios by fluorescence in situ hybridization and correlation with immunohistochemistry in a cohort of 6556 breast cancer tissues. Clin. Breast Cancer5, 63–69 (2004). - PubMed
    1. Climente-González, H., Porta-Pardo, E., Godzik, A. & Eyras, E. The functional impact of alternative splicing in cancer. Cell Reports20, 2215–2226 (2017). - PubMed
    1. Faderl, S. et al. The biology of chronic myeloid leukemia. N. Engl. J. Med. 341, 164–172 (1999). - PubMed
    1. The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Network. Pan-cancer analysis of whole genomes. Nature10.1038/s41586-020-1969-6 (2020).

Publication types