Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 11;37(5):639-654.e6.
doi: 10.1016/j.ccell.2020.04.012.

Comprehensive Analysis of Genetic Ancestry and Its Molecular Correlates in Cancer

Collaborators, Affiliations

Comprehensive Analysis of Genetic Ancestry and Its Molecular Correlates in Cancer

Jian Carrot-Zhang et al. Cancer Cell. .

Abstract

We evaluated ancestry effects on mutation rates, DNA methylation, and mRNA and miRNA expression among 10,678 patients across 33 cancer types from The Cancer Genome Atlas. We demonstrated that cancer subtypes and ancestry-related technical artifacts are important confounders that have been insufficiently accounted for. Once accounted for, ancestry-associated differences spanned all molecular features and hundreds of genes. Biologically significant differences were usually tissue specific but not specific to cancer. However, admixture and pathway analyses suggested some of these differences are causally related to cancer. Specific findings included increased FBXW7 mutations in patients of African origin, decreased VHL and PBRM1 mutations in renal cancer patients of African origin, and decreased immune activity in bladder cancer patients of East Asian origin.

Keywords: TCGA; admixture; ancestry; cancer; eQTL; genomics; mRNA; methylation; miRNA; mutation.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests J.Y.N. and G.M.F. are employees of Foundation Medicine and shareholders of Roche. X.L. receives a consultant/advisory fee from Eli Lilly, AstraZeneca, and EMD Serono and research funds from Eli Lilly, Boehringer Ingelheim. P.W.L. is on the Scientific Advisory Boards of AnchorDx and Progenity. A.D.C. receives research funding from Bayer. R.B. owns equity in Ampressa Therapeutics and receives research funding from Novartis.

Figures

Figure 1.
Figure 1.. TCGA donor ancestries
(A) Ancestries were called as the consensus between five independent methods based upon SNP array and/or whole exome sequencing. (B) Ancestry representation in each disease type (upper plot), aggregate fractions of each ancestry among admixed individuals (middle panel), and cancer types with at least 10 individuals of the indicated ancestries (black dots; lower panel). (C) Ancestry representation across tumor subtypes with non-random ancestry distributions. (D) Example local ancestry calls (top) and summary enrichment scores for AFR or EUR ancestry (vertical axis), plotted against genomic location (horizontal axis). See also Figure S1 and Table S1.
Figure 2.
Figure 2.. Ancestry-associated somatic genetic alterations.
(A) QQ plot showing genes whose pan-cancer mutation rates were significantly associated with AFR vs EUR ancestry after controlling cancer type but not subtype. Red and blue respectively indicate higher and lower frequencies in AFR (FDR q<0.1). Cancer subtype adjustment removed TP53 and PIK3CA associations, shown in gray; FBXW7 retained significance. (B) Cancer-specific mutation frequencies in EUR and either AFR (VHL and PBRM1; KIRC) or EAS (HRAS and NFE2L2; BLCA and ESCA respectively) TCGA cohorts. p values represent analyses controlled for cancer subtype. Stars represent genes that validated in external cohorts. (C-D) VHL mutation frequency (vertical axis) plotted against level of admixture (horizontal axis) of (C) AFR and (D) EUR ancestry in KIRC FMI patients. Individual patient admixture levels are indicated by the blue dots at the top of each panel. Yellow dots represent frequencies at each decile of admixture; dot sizes correspond to the patient numbers in each decile. Blue profiles and shadows represent binomial logistic regression (p < 0.001 for VHL in AFR and EUR) and confidence intervals, respectively. (E) Arm-level SCNA frequencies in EUR (vertical axis) and AFR (horizontal axis) cohorts, across all diseases and chromosome arms. Chromosomes 3p and 4q had significantly different rates of loss among KIRC and COAD patients respectively. See also Figure S2 and Table S2.
Figure 3.
Figure 3.. Ancestry-differential DNA methylation
(A) Number of positive control 65 rs probes (‘Explicit SNPs’), probes with measurements directly influenced by SNPs (‘SNP masked’, excluded from later analyses), and all other probes (‘Not Masked’), among probes found to be significant or non-significant in ancestry testing. (B) Left: Regression coefficients between AFR and EUR samples, pan-cancer and in six cancer types. Right: the statistical significance of these differences. (C) Concordant ancestry bias across probes (dots) for the same genes. Genes with at least four ancestry-differential probes are colored. (D) Ancestry bias (vertical axis), computed as the slope (beta) in the regression model, across ancestries in example genes. (E) Methylation at the SPATC1L promoter (cg12016809 beta value, horizontal axis) is associated with reduced gene expression (vertical axis). Beta value distributions are shown as smoothed density plots above scatter plots. (F) Ancestry-associated differentially methylated regions (A-DMRs) detected in 149 whole-genome bisulfite sequenced samples. The PM20D1 promoter and HOOK2 gene body enhancer loci in panels C and D are shown. (G) Isolated probes can also be part of A-DMRs. Top: Probe cg08477332, between S100A14 and S100A13, displays preferential lack of methylation in AFR samples. Bottom: At least six contiguous CpGs neighboring cg08477332 display concordant methylation, a potential A-DMR. See also Figures S3, S4 and Table S3.
Figure 4.
Figure 4.. Ancestry-associated mRNAs.
Genes associated with ancestry (FDR q<0.001) after correcting for either batch alone or batch and cancer subtype. Expression ratios and significance levels were plotted for AFR (A) and EAS (B) associated genes. Genes that were significant in 33% of tumor types are highlighted (bottom). (C) Overlap of mRNAs associated with AFR or EUR ancestry, identified by either TCGA or GTEx. (D) Effect sizes (as regression coefficients) from TCGA (horizontal axis) and GTEx (vertical axis) analyses, for ancestry-associated mRNAs identified in both analyses. (E-F) Median levels per tumor type of the ancestry associated genes (E) GSTM1 and CRYBB2 (F) PPIL3 and FBLL1. Dot sizes indicate sample sizes and colors indicate ancestry. See also Figure S5 and Table S4.
Figure 5.
Figure 5.. Ancestry-associated miRNA mature strands.
(A) Number of ancestry-associated miRs (FDR q<0.001), pan-cancer and in six cancer types. (B) Distributions of log2 fold changes for the associations in (A). (C) Ancestry-differential expression of miR-628–5p in basal BRCA, and miR-4326 in MSI STAD. Violin plot widths reflect kernel density estimates; solid and dashed lines reflect median and interquartile range. (D) Genomic neighborhood of hsa-mir-628, modified from the UCSC browser. The miRNA is within an intron of and on the same strand as host gene CCPG1. Red boxes are TSS loci (Marsico et al. 2013). The pale blue box at the bottom is a miRBase v22.1 read pileup on the miRNA’s stem-loop sequence. (E) Expression of miR-628–5p and CCPG1 in BRCA (left) and KIRC (right) samples, with Spearman rho values. (F) Distribution of rho values between hosted mature strands and host genes in BLCA, BRCA, CESC and ESCA. See also Figure S6 and Table S5.
Figure 6.
Figure 6.. Ancestry-associated eQTLs.
Ancestry-associated germline variation cis-effects on expression in (A) AFR-EUR and (B) EAS-EUR comparisons. Dots, representing cancer type and colored by the number of samples in the minority population, are plotted against the number of PancanQTL eGenes with at least one ancestry-associated SNP (horizontal axis), and the proportion of ancestry-associated eSNPs (vertical axis). (C) Representative cis-eQTL rs2058665-PPIL3 in BRCA. (i-ii) PPIL3 expression by (i) ancestry and (ii) SNP genotype. (iii) Proportions of samples with each genotype, by ancestry (Wald’s association test, FDR q<0.01). (iv) PPIL3 expression by genotype between EUR and AFR samples (p values: Wilcoxon test). Violin plot widths reflect kernel density estimates. Lines show median and interquartile ranges. See also Figure S7 and Table S6.
Figure 7.
Figure 7.. Ancestry-differential pathway features.
(A) Mean differences (red: higher in EUR; blue: lower in EUR) in PARADIGM-inferred integrated pathway levels (IPLs) of regulatory nodes with ≥10 ancestry-differential downstream targets, by tumor type. Gray denotes regulatory nodes that are not differential or have <10 differential downstream targets. (B) ATM IPLs of AFR and EUR Luminal A BRCA samples. (C) MYC/Max complex IPLs of EAS and EUR BLCA subtype 5 samples. In B and C, the violin plot widths reflect kernel density estimates and internal boxplots show median, interquartile range, and 1.5 times the interquartile range. (D) Cancer-associated genes and pathways enriched among differential pathway features between ancestry groups, from subtype-adjusted analyses. (E) Association of EAS ancestry with immune infiltration score. Coefficients from a multivariate logistic regression are shown on the horizontal axis. Red and green dots indicate correlations with FDR q < 0.05 and < 0.25, respectively. (F) Expression of CD274, which encodes PD-L1, in AFR, EAS, and EUR ancestries across all cancers with at least 10 samples from the minority cohort. Boxeplots show median, interquartile range and 1.5 times the interquartile range. See also Table S7.

Comment in

References

    1. Alexander DH, Novembre J, and Lange K (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19, 1655–1664. - PMC - PubMed
    1. Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Ng AWT, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, et al. (2019). The Repertoire of Mutational Signatures in Human Cancer. Nature 578, 94–101. - PMC - PubMed
    1. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B, et al. (2018). Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371–385 e318. - PMC - PubMed
    1. Benjamini Y, and Hochberg Y (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Statist Soc B 57, 289–300.
    1. Beroukhim R, Brunet JP, Di Napoli A, Mertz KD, Seeley A, Pires MM, Linhart D, Worrell RA, Moch H, Rubin MA, et al. (2009). Patterns of gene expression and copy-number alterations in von-hippel lindau disease-associated and sporadic clear cell carcinoma of the kidney. Cancer Res 69, 4674–4681. - PMC - PubMed

Publication types

MeSH terms