Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 2;12(11):2530-2551.
doi: 10.1158/2159-8290.CD-22-0138.

African Ancestry-Associated Gene Expression Profiles in Triple-Negative Breast Cancer Underlie Altered Tumor Biology and Clinical Outcome in Women of African Descent

Affiliations

African Ancestry-Associated Gene Expression Profiles in Triple-Negative Breast Cancer Underlie Altered Tumor Biology and Clinical Outcome in Women of African Descent

Rachel Martini et al. Cancer Discov. .

Abstract

Women of sub-Saharan African descent have disproportionately higher incidence of triple-negative breast cancer (TNBC) and TNBC-specific mortality across all populations. Population studies show racial differences in TNBC biology, including higher prevalence of basal-like and quadruple-negative subtypes in African Americans (AA). However, previous investigations relied on self-reported race (SRR) of primarily U.S. populations. Due to heterogeneous genetic admixture and biological consequences of social determinants, the true association of African ancestry with TNBC biology is unclear. To address this, we conducted RNA sequencing on an international cohort of AAs, as well as West and East Africans with TNBC. Using comprehensive genetic ancestry estimation in this African-enriched cohort, we found expression of 613 genes associated with African ancestry and 2,000+ associated with regional African ancestry. A subset of African-associated genes also showed differences in normal breast tissue. Pathway enrichment and deconvolution of tumor cellular composition revealed that tumor-associated immunologic profiles are distinct in patients of African descent.

Significance: Our comprehensive ancestry quantification process revealed that ancestry-associated gene expression profiles in TNBC include population-level distinctions in immunologic landscapes. These differences may explain some differences in race-group clinical outcomes. This study shows the first definitive link between African ancestry and the TNBC immunologic landscape, from an African-enriched international multiethnic cohort. See related commentary by Hamilton et al., p. 2496. This article is highlighted in the In This Issue feature, p. 2483.

PubMed Disclaimer

Figures

Figure 1. Estimated genetic ancestry distribution in an African-enriched TNBC RNA-seq cohort. Genetic ancestry was estimated from genotypes of the ancestry-informed markers obtained from our RNA-seq alignments, in which we have superpopulation ancestry estimations, relative to the 1000 Genomes superpopulation populations (A), and subpopulation ancestry estimations for each individual in our cohort (B). In both A and B, each column represents an individual in the cohort, in which estimated ancestry from a given superpopulation or subpopulation is shown on the y-axis, and the x-axis is annotated by SRR and location. Superpopulation populations in A are East Asian (EAS, red), South Asian (SAS, blue), European (EUR, green), American (AMR, purple), and African (AFR, orange). Subpopulations in B are shown in variations of their corresponding superpopulation population color (i.e., AFR populations are in varying shades of orange), and population codes are reported in Supplementary Table S1. Samples are ordered by decreasing AFR ancestry [x-axis left to right: African/Ghanaian (Ghana), AA (Alabama, Detroit, New York), African/Ethiopian (Ethiopia), EA (Alabama, Detroit, New York), other/declined (New York), and Asian (New York)]. C, Constellation plot showing phylogeny of samples based on ancestry estimations. SRR of samples are indicated by the colored dots (Ghanaian = light blue, AA = light green, Ethiopian = dark blue, EA = dark green, Asian = light pink, and other/declined = dark pink). Site location of samples is annotated next to the colored dots (A = Alabama, USA; D = Detroit, MI, USA; E = Ethiopia; G = Ghana; and N = New York, NY, USA). D, Scatter plot showing inverse correlation of AFR and EUR ancestry in our gene expression cohort.
Figure 1.
Estimated genetic ancestry distribution in an African-enriched TNBC RNA-seq cohort. Genetic ancestry was estimated from genotypes of the ancestry-informed markers obtained from our RNA-seq alignments, in which we have superpopulation ancestry estimations, relative to the 1000 Genomes superpopulation populations (A), and subpopulation ancestry estimations for each individual in our cohort (B). In both A and B, each column represents an individual in the cohort, in which estimated ancestry from a given superpopulation or subpopulation is shown on the y-axis, and the x-axis is annotated by SRR and location. Superpopulation populations in A are East Asian (EAS, red), South Asian (SAS, blue), European (EUR, green), American (AMR, purple), and African (AFR, orange). Subpopulations in B are shown in variations of their corresponding superpopulation population color (i.e., AFR populations are in varying shades of orange), and population codes are reported in Supplementary Table S1. Samples are ordered by decreasing AFR ancestry [x-axis left to right: African/Ghanaian (Ghana), AA (Alabama, Detroit, New York), African/Ethiopian (Ethiopia), EA (Alabama, Detroit, New York), other/declined (New York), and Asian (New York)]. C, Constellation plot showing phylogeny of samples based on ancestry estimations. SRR of samples are indicated by the colored dots (Ghanaian = light blue, AA = light green, Ethiopian = dark blue, EA = dark green, Asian = light pink, and other/declined = dark pink). Site location of samples is annotated next to the colored dots (A = Alabama, USA; D = Detroit, MI, USA; E = Ethiopia; G = Ghana; and N = New York, NY, USA). D, Scatter plot showing inverse correlation of AFR and EUR ancestry in our gene expression cohort.
Figure 2. AFR ancestry–associated genes show enrichment in the immune response. A, Venn diagram of ancestry-associated genes identified from the AFR and EUR genetic ancestry linear regression model, in which ancestry was used as a continuous variable. DEG, differentially expressed gene. B, Scatter plot showing log2 fold change of 293 overlapping genes from AFR- and EUR-associated gene signatures. The top left quadrant represents those genes upregulated with increasing AFR ancestry (positive log2 fold change on the y-axis) and subsequently downregulated with increasing EUR ancestry (negative log2 fold change on the x-axis). C, Unsupervised hierarchical clustering of 613 AFR ancestry–associated genes. Columns represent individuals, where SRR, ancestry estimates, and TNBC subtypes are indicated in the colormap at the top of the heat map. Rows represent genes, where lighter yellow indicated minimum row expression and darker purple shows maximum row expression. AMR, American; TNHF, Triple-Negative Hetero Fluid. D, Constellation plot representing the nodal structure of individuals from C, where points are colored by SRR (Ghanaian = light blue, Ethiopian = dark blue, AA = light green). Node highlighted by red box indicates increased admixture node, which is highlighted in C by the red star. E, Volcano plot of AFR-associated genes, showing 613 significant genes in red.Network (F) and treemap (G) diagrams of Ingenuity Pathway Analysis (IPA) immune cell trafficking disease and function terms, in which the 613 AFR gene signature was enriched (P value range of terms = 0.0119–0.000502). Genes in red or green are upregulated or downregulated among high AFR individuals, respectively. Genes and treemap boxes in orange represent a positive z-score (predicted activation), and those in blue represent a negative z-score (predicted inhibition). H, Parallel plot depicting positive or negative correlation (correl) of 17 overlapping AFR ancestry–associated genes in normal (GTEx) or TNBC tumor (ICSBCS) tissue.I, Scatter plots of representative genes by ancestry in both normal (GTEx) and tumor (ICSBCS), and Kaplan–Meier survival plots in TCGA BRCA data among AA and EA patients with breast cancer. In the GTEx normal tissue cohort, AFR ancestry is increasing with increasing principal component 1 (PC1). The red P value highlights P < 0.05.J, Top de novo network from IPA of 17 overlapping AFR ancestry–associated genes. Color coding of expression is relative to the TNBC tumor (ICSBCS) tissue. Genes circled in red show opposite association with AFR ancestry in normal (GTEx) tissue (i.e., CALHM4/FAM26D is upregulated in TNBC tumor tissue, but negatively correlated with AFR ancestry in nondiseased breast tissue).
Figure 2.
AFR ancestry–associated genes show enrichment in the immune response. A, Venn diagram of ancestry-associated genes identified from the AFR and EUR genetic ancestry linear regression model, in which ancestry was used as a continuous variable. DEG, differentially expressed gene. B, Scatter plot showing log2 fold change of 293 overlapping genes from AFR- and EUR-associated gene signatures. The top left quadrant represents those genes upregulated with increasing AFR ancestry (positive log2 fold change on the y-axis) and subsequently downregulated with increasing EUR ancestry (negative log2 fold change on the x-axis). C, Unsupervised hierarchical clustering of 613 AFR ancestry–associated genes. Columns represent individuals, where SRR, ancestry estimates, and TNBC subtypes are indicated in the colormap at the top of the heat map. Rows represent genes, where lighter yellow indicated minimum row expression and darker purple shows maximum row expression. AMR, American; TNHF, Triple-Negative Hetero Fluid. D, Constellation plot representing the nodal structure of individuals from C, where points are colored by SRR (Ghanaian = light blue, Ethiopian = dark blue, AA = light green). Node highlighted by red box indicates increased admixture node, which is highlighted in C by the red star. E, Volcano plot of AFR-associated genes, showing 613 significant genes in red.Network (F) and treemap (G) diagrams of Ingenuity Pathway Analysis (IPA) immune cell trafficking disease and function terms, in which the 613 AFR gene signature was enriched (P value range of terms = 0.0119–0.000502). Genes in red or green are upregulated or downregulated among high AFR individuals, respectively. Genes and treemap boxes in orange represent a positive z-score (predicted activation), and those in blue represent a negative z-score (predicted inhibition). H, Parallel plot depicting positive or negative correlation (correl) of 17 overlapping AFR ancestry–associated genes in normal (GTEx) or TNBC tumor (ICSBCS) tissue.I, Scatter plots of representative genes by ancestry in both normal (GTEx) and tumor (ICSBCS), and Kaplan–Meier survival plots in TCGA BRCA data among AA and EA patients with breast cancer. In the GTEx normal tissue cohort, AFR ancestry is increasing with increasing principal component 1 (PC1). The red P value highlights P < 0.05.J, Top de novo network from IPA of 17 overlapping AFR ancestry–associated genes. Color coding of expression is relative to the TNBC tumor (ICSBCS) tissue. Genes circled in red show opposite association with AFR ancestry in normal (GTEx) tissue (i.e., CALHM4/FAM26D is upregulated in TNBC tumor tissue, but negatively correlated with AFR ancestry in nondiseased breast tissue).
Figure 3. African subpopulation–associated genes are also enriched in the immune response. A, Venn diagram of unique and overlapping gene signatures associated with LWK, ESN, MSL, YRI, and GWD ancestry, respectively. Dots that are bolded are genes that overlap with the 613 AFR-associated gene signature. IPA analysis of (B) LWK-associated and (C) MSL-associated genes. B, Colors in blue indicate inhibition of regulators, disease/function terms and canonical pathways among individuals with increasing LWK ancestry. C, Colors in orange indicated activation or regulators, disease/function terms, and canonical pathways among individuals with increasing MSL ancestry.
Figure 3.
African subpopulation–associated genes are also enriched in the immune response. A, Venn diagram of unique and overlapping gene signatures associated with LWK, ESN, MSL, YRI, and GWD ancestry, respectively. Dots that are bolded are genes that overlap with the 613 AFR-associated gene signature. Ingenuity Pathway Analysis of LWK-associated (B) and MSL-associated (C) genes. B, Colors in blue indicate inhibition of regulators, disease/function terms, and canonical pathways among individuals with increasing LWK ancestry. C, Colors in orange indicate activation or regulators, disease/function terms, and canonical pathways among individuals with increasing MSL ancestry.
Figure 4. Immune deconvolution of bulk tumors shows enrichment of immune cells among high AFR ancestry tumors. A, Box plot of the TAL absolute score among high AFR and low AFR samples (Student t test P = 0.0076). B, Stacked bar chart of TAL populations significantly different between AFR-high and AFR-low samples. NK, natural killer. C, Correlation of AFR ancestry and CIBERSORTx TAL populations. Significant correlations are highlighted in shades of red. TAL populations with a star represent immunosuppressive cell populations (41). ns, not significant. D, Box plot of gene expression of CD3D (light blue), CD3E (dark blue), CD3G (light green), CD274 (dark green), CTLA4 (light pink), FOXP3 (dark red), and PDCD1 (light orange) across SRR groups (G = Ghanaian, AA = African American, E = Ethiopian) or AFR cluster groups (AFR high, AFR low). Significant ANOVA P values and paired Student t tests (***, P < 0.001; **, P < 0.01; *, P < 0.05) are reported. E, Correlation of immune marker gene expression (bottom) and CIBERSORTx TAL populations (left). Positive correlation is shown in red, and negative correlation is shown in green. Size of the dot represents the significance of the correlations.F, Representative IHC images of CD3 (blue) and FOXP3 (black) staining in AA (top left), Ghanaian (top right), Ethiopian (bottom left), and EA (bottom right) TNBC cases. G, Box plots of percent positive CD3 (blue) and FOXP3 (black) stained cells from IHC images by SRR groups. ANOVA P values and paired Student t tests (**, P < 0.01; *, P < 0.05) are reported on the plot. Stacked bar charts of stroma (H) and tumor (I) segmented regions representing immune cell abundance from GeoMx staining data. Student t tests were conducted to determine immune cell populations with differential abundance between SRR groups and are reported on the plots (**, P < 0.01; *, P < 0.05). Monocytes C, monocytes classical; Monocytes NC I, monocytes nonclassical I.
Figure 4.
Immune deconvolution of bulk tumors shows enrichment of immune cells among high AFR ancestry tumors. A, Box plot of the TAL absolute score among high AFR and low AFR samples (Student t test P = 0.0076). B, Stacked bar chart of TAL populations significantly different between AFR-high and AFR-low samples. NK, natural killer. C, Correlation of AFR ancestry and CIBERSORTx TAL populations. Significant correlations are highlighted in shades of red. TAL populations with a star represent immunosuppressive cell populations (41). ns, not significant. D, Box plot of gene expression of CD3D (light blue), CD3E (dark blue), CD3G (light green), CD274 (dark green), CTLA4 (light pink), FOXP3 (dark red), and PDCD1 (light orange) across SRR groups (G = Ghanaian, AA = African American, E = Ethiopian) or AFR cluster groups (AFR high, AFR low). Significant ANOVA P values and paired Student t tests (***, P < 0.001; **, P < 0.01; *, P < 0.05) are reported. E, Correlation of immune marker gene expression (bottom) and CIBERSORTx TAL populations (left). Positive correlation is shown in red, and negative correlation is shown in green. Size of the dot represents the significance of the correlations.F, Representative IHC images of CD3 (blue) and FOXP3 (black) staining in AA (top left), Ghanaian (top right), Ethiopian (bottom left), and EA (bottom right) TNBC cases. G, Box plots of percent positive CD3 (blue) and FOXP3 (black) stained cells from IHC images by SRR groups. ANOVA P values and paired Student t tests (**, P < 0.01; *, P < 0.05) are reported on the plot. Stacked bar charts of stroma (H) and tumor (I) segmented regions representing immune cell abundance from GeoMx staining data. Student t tests were conducted to determine immune cell populations with differential abundance between SRR groups and are reported on the plots (**, P < 0.01; *, P < 0.05). Monocytes C, monocytes classical; Monocytes NC I, monocytes nonclassical I.
Figure 5. TNBC subtyping reveals heterogeneity of tumors. A, Pie charts showing the distribution of TNBC subtypes across SRR groups for the TNBCtype initial call (Vandy Call, top row), TNBCtype call after removing/reassigning IM and MSL calls (middle row), and the calls using our median ranks method (bottom row). B, Heat map of correlations from the Vanderbilt TNBC subtyping tool, and our median ranks calling for TNBC subtypes. Color map at the top indicates SRR/ethnicity, high or low AFR cluster sample, Vanderbilt TNBC subtyping call, Vanderbilt call after removal of IM/MSL, and our median ranks call. Samples are clustered into 5 groups, which are color-coded and labeled 1–5 on the dendrogram at the bottom. C, Line plot depicting positive and negative correlations with the Vandy tool and the median-ranking subtype calls in each of the TNHF clusters. D, Pie charts showing the distribution of TNHF clusters across SRR groups. E, Sankey plot showing the distribution of calls from initial Vanderbilt TNBCtype results to Vanderbilt call after removal of IM/MSL, to our median ranks method, to the final TNHF clusters from B. Color coding is based on the initial Vandy Call (left). Bar chart to the right shows the number of tumors from AFR high or AFR low in a given cluster. F, Stacked bar chart of CIBERSORTx TAL populations in each of the TNHF clusters.
Figure 5.
TNBC subtyping reveals the heterogeneity of tumors. A, Pie charts showing the distribution of TNBC subtypes across SRR groups for the TNBCtype initial call [Vanderbilt call (Vandy call), top row], TNBCtype call after removing/reassigning IM and MSL calls (middle row), and the calls using our median ranks method (bottom row). B, Heat map of correlations from the Vanderbilt TNBC subtyping tool and our median ranks calling for TNBC subtypes. Color map at the top indicates SRR/ethnicity, high or low AFR cluster sample, Vanderbilt TNBC subtyping call, Vandy call after removal of IM/MSL, and our median ranks call. Samples are clustered into five groups, which are color coded and labeled 1 to 5 on the dendrogram at the bottom. C, Line plot depicting positive and negative correlations with the Vanderbilt tool and the median-ranking subtype calls in each of the TNHF clusters.D, Pie charts showing the distribution of TNHF clusters across SRR groups. E, Sankey plot showing the distribution of calls from initial Vanderbilt TNBCtype results to Vanderbilt call after removal of IM/MSL, to our median ranks method, and to the final TNHF clusters from B. Color coding is based on the initial Vandy Call (left). Bar chart to the right shows the number of tumors from AFR high or AFR low in a given cluster. F, Stacked bar chart of CIBERSORTx TAL populations in each of the TNHF clusters.
Figure 6. SRR-unique gene signature enriched in comorbid canonical pathways. A, Venn diagram depicting the overlap of AFR-, EUR-, and SRR-associated genes. DEG, differentially expressed gene. B, Unsupervised hierarchical clustering of the 1,071 SRR-associated genes. AMR, American. C, Unsupervised clustering of 751 genes unique to SRR. In both B and C, columns represent individuals, where SRR and ancestry are shown in the color map at the top, and rows represent genes. Node structure of individuals is shown at the bottom of the heat maps, where clustering was the individual node structure significantly changed between B and C. D, Comparing gene expression values from the node structure in C, we determined enrichment of genes in known canonical pathways that would be associated with environmental exposures and/or potential patient comorbidities. Z-scores indicated predicted activation (positive z-score, orange) or inhibition (negative z-score, blue) of the pathway based on the expression of the genes in the pathway in the directionality of AAs. Black striped bars indicated pathways where no z-score/predication was indicated due to insufficient evidence in the Ingenuity Pathway Analysis knowledge base. The red line indicates a P value cutoff of 0.05 [−log(0.05) = ∼1.3].
Figure 6.
SRR-unique gene signature enriched in comorbid canonical pathways. A, Venn diagram depicting the overlap of AFR-, EUR-, and SRR-associated genes. DEG, differentially expressed gene. B, Unsupervised hierarchical clustering of the 1,071 SRR-associated genes. AMR, American. C, Unsupervised clustering of 751 genes unique to SRR. In both B and C, columns represent individuals, where SRR and ancestry are shown in the color map at the top, and rows represent genes. Node structure of individuals is shown at the bottom of the heat maps, where clustering was the individual node structure significantly changed between B and C. D, Comparing gene expression values from the node structure in C, we determined enrichment of genes in known canonical pathways that would be associated with environmental exposures and/or potential patient comorbidities. Z-scores indicated predicted activation (positive z-score, orange) or inhibition (negative z-score, blue) of the pathway based on the expression of the genes in the pathway in the directionality of AAs. Black striped bars indicated pathways where no z-score/predication was indicated due to insufficient evidence in the Ingenuity Pathway Analysis knowledge base. The red line indicates a P value cutoff of 0.05 [−log(0.05) = ∼1.3].

Comment in

References

    1. Global Burden of Disease Cancer C, Fitzmaurice C, Akinyemiju TF, Al Lami FH, Alam T, Alizadeh-Navaei R, et al. . Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 29 cancer groups, 1990 to 2016: a systematic analysis for the Global Burden of Disease Study. JAMA Oncol 2018;4:1553–68. - PMC - PubMed
    1. Torre LA, Islami F, Siegel RL, Ward EM, Jemal A. Global cancer in women: burden and trends. Cancer Epidemiol Biomarkers Prev 2017;26:444–57. - PubMed
    1. Martini R, Newman L, Davis M. Breast cancer disparities in outcomes; unmasking biological determinants associated with racial and genetic diversity. Clin Exp Metastasis 2022;39:7–14. - PMC - PubMed
    1. Newman LA, Kaljee LM. Health disparities and triple-negative breast cancer in African American women: a review. JAMA Surg 2017;152:485–93. - PubMed
    1. Jiagge E, Jibril AS, Chitale D, Bensenhaver JM, Awuah B, Hoenerhoff M, et al. . Comparative analysis of breast cancer phenotypes in African American, White American, and West Versus East African patients: correlation between African ancestry and triple-negative breast cancer. Ann Surg Oncol 2016;23:3843–9. - PubMed

Publication types