Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;609(7927):552-559.
doi: 10.1038/s41586-022-05154-6. Epub 2022 Aug 31.

African-specific molecular taxonomy of prostate cancer

Affiliations

African-specific molecular taxonomy of prostate cancer

Weerachai Jaratlerdsiri et al. Nature. 2022 Sep.

Abstract

Prostate cancer is characterized by considerable geo-ethnic disparity. African ancestry is a significant risk factor, with mortality rates across sub-Saharan Africa of 2.7-fold higher than global averages1. The contributing genetic and non-genetic factors, and associated mutational processes, are unknown2,3. Here, through whole-genome sequencing of treatment-naive prostate cancer samples from 183 ancestrally (African versus European) and globally distinct patients, we generate a large cancer genomics resource for sub-Saharan Africa, identifying around 2 million somatic variants. Significant African-ancestry-specific findings include an elevated tumour mutational burden, increased percentage of genome alteration, a greater number of predicted damaging mutations and a higher total of mutational signatures, and the driver genes NCOA2, STK19, DDX11L1, PCAT1 and SETBP1. Examining all somatic mutational types, we describe a molecular taxonomy for prostate cancer differentiated by ancestry and defined as global mutational subtypes (GMS). By further including Chinese Asian data, we confirm that GMS-B (copy-number gain) and GMS-D (mutationally noisy) are specific to African populations, GMS-A (mutationally quiet) is universal (all ethnicities) and the African-European-restricted subtype GMS-C (copy-number losses) predicts poor clinical outcomes. In addition to the clinical benefit of including individuals of African ancestry, our GMS subtypes reveal different evolutionary trajectories and mutational processes suggesting that both common genetic and environmental factors contribute to the disparity between ethnicities. Analogous to gene-environment interaction-defined here as a different effect of an environmental surrounding in people with different ancestries or vice versa-we anticipate that GMS subtypes act as a proxy for intrinsic and extrinsic mutational processes in cancers, promoting global inclusion in landmark studies.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Mutational density in prostate tumours of individuals with different ancestries.
a, The distribution of somatic aberrations (event number or number of base pairs) for 7 mutational types across 183 tumour–blood WGS pairs representing n = 61 European, n = 113 African and n = 9 admixed individuals. The box plots show the median (centre line), the 25th and 75th percentiles (box limits), and ±1.5× the interquartile range (whiskers). b, The different types of mutational burden observed in this cohort. The samples were percentile-ranked and then ordered on the basis of the sum of percentiles across the mutational types observed in each ancestral group (left). Right, Spearman correlation is shown between mutational types, with the dot size representing the magnitude of correlation and the background colour giving the statistical significance of FDR values.
Fig. 2
Fig. 2. Taxonomy and differences in driver mutations in prostate cancer by ancestry.
a, The selected 35 driver genes classified as (1) the most altered in this study (>10 patients), irrespective of ancestry (green); (2) DNA-damage repair (DDR) genes that are known to be associated with African ancestry (orange); (3) other ancestry-associated genes studied in prostate cancer (assoc., purple). The OR, 95% confidence interval and two-sided P value (<0.05) were calculated using Fisher exact tests for count data and including 10 African-specific (OR = 0) and 3 European-specific (OR = infinity) genes. Significance was observed for TMPRSS2 (P = 0.0006), ERG (P = 0.003), SETBP1 (P = 0.012), DDX11L1 (P = 0.0001), STK19 (P = 0.004), NCOA2 (P = 3.14 × 10−6), PCAT1 (P = 0.012), PAPSS2 (P = 0.042) and MTCH2 (P = 0.014). b, The mutational frequency of the altered driver genes between Africans and Europeans by mutational type (CDS, non-coding, SV and CNA). c, An integrative clustering analysis reveals four distinct molecular subtypes of prostate cancer. The molecular subtypes are illustrated by small somatic mutations (coding regions and non-coding elements), somatic CNAs and somatic SVs. The proportion and association between the iCluster membership and patient ancestry are illustrated in  d. Additional unsupervised consensus clustering on each data type was performed and mostly recapitulated the subtypes by integrative analysis. d, Total somatic mutations across four molecular subtypes in this study. The dashed lines indicate the median values of mutational densities across the four subtypes. For each subtype, patients are ordered on the basis of their ancestry.
Fig. 3
Fig. 3. Significance of somatic aberrations across four diverse subtypes.
a, Analysis of the long tail of driver genes using different combinations of mutational types (CDS, coding driver data; NC, non-coding driver data; SV, significantly recurrent breakpoint data; and CN, gene-level CN data), resulting in the identification of 124 preferentially mutated genes among the subtypes. Ordered by mutational frequency, 100 (80.6%) have been reported as significantly recurrent mutations/SV breakpoints in the PCAWG Consortium, and 24 (19.4%) are significantly mutated in this study (marked by asterisks). Using iClusterplus, unsupervised hierarchical clustering of all mutational types identified four prostate cancer subtypes (A–D; Fig. 2c), presented for 183 patients (rows) and 124 mutated genes (columns), with each subgroup ordered by ancestry. Ancestrally diverse subtypes A and C are mutationally quiet and are marked by CN loss, respectively. African-specific/predominant subtypes B and D are marked by CN gains and are mutationally noisy, respectively. Three genes on chromosome X, KDM6A, ATRX and ZMYM3, are considered to be significant due to the abundance of homozygous (homo.) loss present in subtype C. Chr., chromosome; hemi., hemizygous; ISUP, International Society of Urologic Pathologists; NA, not applicable. b, Kaplan–Meier plot of biochemical relapse (BCR)-free survival proportion of European patients for subtype A (n = 161) versus C (n = 19). c, Kaplan–Meier plot of the cancer survival probability of European patients for subtype A (n = 82) versus C (n = 17). For b and c, the probability estimates, 95% confidence intervals and two-sided P values (log-rank test) are indicated.
Fig. 4
Fig. 4. Estimates of genomic aberrations contributed by each mutational signature.
a, Correlation plots of total mutational signatures along with clinical and genomic characteristics. The size of each dot represents the FDR values of Spearman correlation P values (two-sided) using Benjamini–Hochberg correction. The colours of each dot represent the correlation coefficient. GMS subtypes are assigned as 1–4 for subtypes A–D, respectively; African, admixed and European are recorded as 1–3, respectively. The correlation of 32 recurrent genes in prostate cancer is shown on the x axis. Many small- or large-sized mutational signatures agree with the GMS. HR, homologous recombination; PSA, prostate-specific antigen. b, Sankey diagram depicting a proportion of duplication signatures observed across cancer subtypes. Duplication features, including amplification (Amp), translocation (trans) plus, local n-jump, templated insertion (ins), amplification loss of heterozygosity (LOH), gain, tandem duplication and gain LOH (Extended Data Fig. 8a,b) are summed per subtype and equally weighted to 20. Links connecting between nodes (GMS, signatures and features) have widths proportional to the total number of CN or SV features across all patients within each GMS subtype to which they belong. Note that we believe that GMS-B is the identity of the African-specific genomic subtype.
Fig. 5
Fig. 5. Evolutionary history of globally mutated subtypes.
a, The cancer timeline of the universal subtype (A) begins from the fertilized egg to the age of the patients in a cohort. b, The cancer timeline of GMS-C. Estimates for major events, such as whole-genome duplication (WGD) and the emergence of the most-recent common ancestor (MRCA) are used to define the early, variable, late and subclonal stages of tumour evolution approximately in chronological time. When the early and late clonal stages are uncertain, the variable stage is assigned. Driver genes and CNAs are shown in each stage if present in previous studies, and defined by the MutationTime.R program. Mutational signatures (Sigs) that, on average, change over the course of tumour evolution, or are substantially active, are shown as described in the Supplementary Information. The dagger symbols denote alterations that are found to have different timing. Significant pairwise interaction events between the mutations and CNAs were computed to support cancer timelines. The OR and two-sided P value were calculated using Fisher exact tests. Co-occurrence or mutually exclusive event is considered when OR > 2 or OR < 0.5, respectively. The interaction significance between pairs in GMS-A and GMS-C has P values ranging from 2.04 × 10−30 to 0.047 and from 1.64 × 10−27 to 0.045, respectively. Median mutation rates of CpG-to-TpG burden per Gb are calculated using the age-adjusted branch length of cancer clones and maximally branching subclones. The mutation rate plots in a and b show the median ± 2 s.e. of fitted data as dashed lines and error bands, respectively. c, Schematic of a world map with the distribution of GMS-A–D among ancestrally/globally diverse populations. The gene–environment interaction of GMS is shown on the right. The contingency table of the number of patients with different ancestries (germline variants) stratified by subtypes and associated with certain geography or environmental exposure (two-sided P = 0.0005, Fisher exact test with 2,000 bootstraps).
Extended Data Fig. 1
Extended Data Fig. 1. Clinical cohorts and statistical metrics.
a, Clinical and pathological patient. characterization. Pairwise comparisons using contingency tables and Fisher’s Exact test between African ancestry and Admixed/European ancestry are highlighted in bold with two-sided P-value <0.05 (*), <0.01 (**), or <0.001 (***). Summary statistics, including the median, first and third quartiles (Q1-Q3), are also present. b, STRUCTURE analysis of bi-allelic germline variants with the logistic prior model. Model components used to explain structure in the plot are K = 5. All spectrum of African contributions are summed and assigned as African ancestry. c, Saturation curve for all driver types across 183 patients. Recurrent copy number gains and losses were measured using GISTIC v2 (Supplementary Methods). CDS, coding sequence; SV, structural variation. d, Spearman’s correlation between different variables measured in this cohort. Dot sizes represent the magnitude of correlation, with significant P-values (two-sided) <0.01.
Extended Data Fig. 2
Extended Data Fig. 2. Somatic driver mutations in 183 prostate cancer patients of different ancestries.
The covariates on the right show the total number of altered samples for different mutational types. a, Search of the top 300 driver genes altered in primary prostate tumours among 183 specimens. Only driver genes discovered in PCAWG and this study, present in more than six patients or significantly different between Africans and Europeans are chosen for plotting. The top barplot shows the distribution of the number of prostate cancer drivers and/or that of PCAWG. The heatmap shows drivers found in this study (rows) for each patient (columns). Heatmaps are coloured by mutational type. The dual barplot on the left depicts gene-level comparisons of mutational recurrence directly between Africans and Europeans. Bottom covariates show the clinical features of patients. The percentage of transition/transversion mutations across 183 patients shows 1,364,210 small somatic mutations across chromosomes 1-Y. b, The bottom heatmap shows the top 22 of previously reported coding driver genes in prostate cancer observed in this study,,,. The left barplot shows statistical support of recurrence analysis for our study.
Extended Data Fig. 3
Extended Data Fig. 3. Discovery of prostate cancer drivers.
a, The number and types of PCAWG driver genes and elements studied in our cohort. b, Recurrent copy number alterations among 183 prostate tumours identified with a 99% confidence level using GISTIC v2 (Supplementary Methods). The figure shows GISTIC peaks of significant regions of recurrent amplification (red) or deletion (blue) supported by FDR < 0.01. c, Genome-wide scan for significantly recurrent breakpoints in our study. The quantile-quantile plot shows two-sided P-values for mutational densities across 183 prostate cancer patients. Multiple hypothesis corrections using the false discovery rate (FDR; Benjamini–Hochberg method) are shown in Supplementary Table 4. Generalized linear modelling (GLM) of somatic mutation densities along the genome with significant background mutational processes adjusted in the model is also shown. d, Bionano Genomics optical genome mapping at the HLA complex. Examples of HLA translocations from a European patient (ID 12543) and an African patient (ID UP2360) studied in this cohort are characterized by pairs of optical maps, each carrying a fusion junction with flanking fragments aligning to one side of the two reference breakpoints. Using the recurrent HLA breakpoints identified in this study, the genome map of the African specimen is found to have a low-end fusion function matched with chromosome 6 through a manual inspection of unfiltered consensus maps using Bionano Access v1.5.2. Note that the HLA alternate contig fused in the European tumour is different from one suggested by short-read sequencing (chr6_GL000252v2_alt). The reference genome map is an in silico digest of the human reference hg38 with the DLE-1 enzyme. Genome map sizes are indicated on the horizontal axis, in megabase (Mb) units. Matching fluorescent labels between sample and reference genome map are connected by grey lines.
Extended Data Fig. 4
Extended Data Fig. 4. TCGA molecular taxonomy.
a, Seven important oncogenic drivers identified by TCGA within our African and European patients. b, Coding mutations observed within SPOP and FOXA1 genes. Rarely, a mutation at the BTB domain of SPOP gene is shown (R221C in an African patient, KAL0072). FH, forkhead. c, ETV1 fusions within positive patients caused by copy number (CN) losses and/or structural variants (DEL, deletion; ICX, interchromosomal translocation; and INV, unbalanced or balanced inversion). CN changes in chromosome 7 show the ETV1 loss with log2 CN ratio less than −0.2. d, ERG fusions caused by CN losses and/or structural variants.
Extended Data Fig. 5
Extended Data Fig. 5. Prostate cancer genes and pathways.
The search of our 124 preferentially mutated genes across tumour subtypes is carried out using the TCGA and ICGC cancer databases. The top affected genes for each pathway are present with lollipop plots to show their hotspots of simple coding mutations if they existed. Mutational frequencies of each altered gene in a pathway are separately measured between Africans (n = 113) and Europeans (n = 61) and shown on the right as a percentage in order (AFR, EUR).
Extended Data Fig. 6
Extended Data Fig. 6. Major biological pathways and networks of prostate cancer.
a, Networks of functional interactions between driver genes are shown for each cancer pathway. Nodes represent Gene Ontology biological processes and Reactome pathways and edges show functional interactions. b, Pathway alteration frequencies between African and European. A sample was considered altered in a given pathway if at least a single gene in the pathway had a genomic alteration (see Extended Data Fig. 5). P-values indicate the level of significance (two-sided Fisher’s exact test).
Extended Data Fig. 7
Extended Data Fig. 7. Molecular subtypes in prostate cancer and pan-cancers.
a, Unsupervised hierarchical clustering of primary prostate tumours across three major ancestral groups was performed using total somatic mutations present within WGS normalized data. Admixed individuals were also tested in prostate cancer subtypes to which they belonged. b, Molecular subtyping of total somatic mutations within pan-cancer studies, namely pancreatic, ovarian, breast and liver cancers. Raw data of small somatic mutations, structural variants and copy number alterations acquired per cancer were retrieved from the PCAWG. For each subtype, patients are ordered based on their ancestry. Ancestral groups are assigned using a cut-off of ancestral contribution greater than 70%; otherwise, considered as Admixed.
Extended Data Fig. 8
Extended Data Fig. 8. Known and novel mutational signatures in prostate cancer.
a, Copy number signatures in prostate cancer across 45 CN features ranked by mutational processes observed. The six most distinctive signatures and their important components extracted by the NMF algorithm were run on the sample size of 183 genomes. Bar charts represent the estimated proportion of each event feature assigned to each signature (rows sum to one). b, Structural variation signatures in prostate cancer ranked by mutational processes observed from small deletion to reciprocal rearrangement. The eight most distinctive signatures and their important components extracted from 44 features using the NMF algorithm were run on the sample size of 183 genomes. Bar charts represent the estimated proportion of each event feature assigned to each signature (rows sum to one). c, Frequency of SBS, DBS, ID, CN and SV features across 183 tumours. Colours at the bottom panel show the following ancestral groups: i) African, red; ii) Admixed, green; and iii) European, blue. d, Stacked barplots of multiple signature exposures for each mutational type enriched per patient and ranked by ancestral group. In many cases, certain mutational signatures occur more frequent in a tumour than others. The top enrichment of small- to large-size mutational signatures mentioned is shown for each patient in Supplementary Table 9 (see Enrichment). Copy number and structural variation signatures (CN1-6 and SV1-8, respectively) are the first identified in this study for prostate cancer, and their top enrichment of signature mixture/exposure per patient appears to be significantly associated with our GMS (one-way ANOVA or Fisher’s exact test, two-sided P-values = 5.1e-07–0.017), considering either de novo or global mutational signatures discovered in the Catalogue of Somatic Mutations in Cancer (COSMIC). This supports a role of GMS in explaining intrinsic and extrinsic mutational processes in cancer.
Extended Data Fig. 9
Extended Data Fig. 9. Total profiles of SBS, DBS, ID, CN and SV signatures.
The classification of each signature type (SBS, 96 classes; DBS, 78 classes; ID, 83 classes; CN, 45 classes; and SV, 44 classes) is described in Supplementary Methods. The plotted data are available in digital form (Supplementary Table 9).
Extended Data Fig. 10
Extended Data Fig. 10. Stages of prostate tumour development.
a, Clonal architecture and its frequency in prostate cancer between Africans and Europeans. Tumours are divided into three groups: monoclonal, linear and branching polyclonal. The number of small somatic mutations (SSM) and CNAs as percentage of genome alteration (PGA) is provided as median and range in bracket. Cancer cell fraction (CCF) in each clone and/or subclone is shown in a circular node. Tumours that show characteristics consistent with being polytumours or with multiple independent primary tumors are excluded to remain conservative. b, Unbiased hierarchical clustering of CNAs between clonal (trunk) and subclonal (branch) mutations. Trunk mutations encompass those that occur between the root node (normal) and its only child node, while all others are classified to have occurred in branch. Red indicates gain; blue indicates loss; and rows indicate patients. Unidentified regions in trunk and branch are assumed to have neutral copy number. ConsensusClusterPlus showed seven CNA clusters among our patients to be optimal. The figure shows that a trunk alteration from one patient is mutationally similar to a branch alteration from another, rather than to other trunk ones from different patients in a cohort. c, Cancer timelines of GMS-B and -D identified in this study. Detailed explanation is provided in Fig. 5. Significant somatic interactions based on Fisher’s Exact test are indicated by odds ratio (OR) estimates and two-sided P-values on the top left panels. Interaction significance between somatic events in GMS-B and -D has P-values ranging from 3.16e-22–0.041 and 9.11e-25, respectively. Mutation rate plots show the median ±2× standard error of fitted data as dashed lines and error bands, respectively. d, Relative ordering model (PhylogicNDT LeagueModel) results for a cohort of 66 samples. The samples can be analysed if they have somatic events of interest prevalent greater than 5% of the sample size and have informative clonal status available for each event (16 events). Probability distributions show the uncertainty of timing for specific events in the cohort. e, Molecular timing distribution of copy number gains and loss of heterozygosity (LOH) between Africans and Europeans. Pie charts depict the distribution of the inferred mutation time for a given copy number alteration. Orange denotes early clonal gains/LOH, with a gradient to green for late gains/LOH. The size of each chart is proportional to the recurrence of this event across different patients. Most of the gains and LOH are considered early clonal based on MutationTimeR results. Whole-genome duplication is more frequent in Africans (63%) than in Europeans (57%).

Comment in

References

    1. Sung H, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 Cancers in 185 countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Alexandrov L, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–421. doi: 10.1038/nature12477. - DOI - PMC - PubMed
    1. Alexandrov LB, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578:94–101. doi: 10.1038/s41586-020-1943-3. - DOI - PMC - PubMed
    1. Sandhu S, et al. Prostate cancer. Lancet. 2021;398:1075–1090. doi: 10.1016/S0140-6736(21)00950-8. - DOI - PubMed
    1. Boutros PC, et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat. Genet. 2015;47:736–745. doi: 10.1038/ng.3315. - DOI - PubMed