Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 17:12:749459.
doi: 10.3389/fimmu.2021.749459. eCollection 2021.

A Machine Learning Model to Predict the Triple Negative Breast Cancer Immune Subtype

Affiliations

A Machine Learning Model to Predict the Triple Negative Breast Cancer Immune Subtype

Zihao Chen et al. Front Immunol. .

Abstract

Background: Immune checkpoint blockade (ICB) has been approved for the treatment of triple-negative breast cancer (TNBC), since it significantly improved the progression-free survival (PFS). However, only about 10% of TNBC patients could achieve the complete response (CR) to ICB because of the low response rate and potential adverse reactions to ICB.

Methods: Open datasets from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) were downloaded to perform an unsupervised clustering analysis to identify the immune subtype according to the expression profiles. The prognosis, enriched pathways, and the ICB indicators were compared between immune subtypes. Afterward, samples from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset were used to validate the correlation of immune subtype with prognosis. Data from patients who received ICB were selected to validate the correlation of the immune subtype with ICB response. Machine learning models were used to build a visual web server to predict the immune subtype of TNBC patients requiring ICB.

Results: A total of eight open datasets including 931 TNBC samples were used for the unsupervised clustering. Two novel immune subtypes (referred to as S1 and S2) were identified among TNBC patients. Compared with S2, S1 was associated with higher immune scores, higher levels of immune cells, and a better prognosis for immunotherapy. In the validation dataset, subtype 1 samples had a better prognosis than sub type 2 samples, no matter in overall survival (OS) (p = 0.00036) or relapse-free survival (RFS) (p = 0.0022). Bioinformatics analysis identified 11 hub genes (LCK, IL2RG, CD3G, STAT1, CD247, IL2RB, CD3D, IRF1, OAS2, IRF4, and IFNG) related to the immune subtype. A robust machine learning model based on random forest algorithm was established by 11 hub genes, and it performed reasonably well with area Under the Curve of the receiver operating characteristic (AUC) values = 0.76. An open and free web server based on the random forest model, named as triple-negative breast cancer immune subtype (TNBCIS), was developed and is available from https://immunotypes.shinyapps.io/TNBCIS/.

Conclusion: TNBC open datasets allowed us to stratify samples into distinct immunotherapy response subgroups according to gene expression profiles. Based on two novel subtypes, candidates for ICB with a higher response rate and better prognosis could be selected by using the free visual online web server that we designed.

Keywords: TCGA; TNBC (triple negative breast cancer); immune checkpoint blockade; immune subtype; web server.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The flowchart of this study.
Figure 2
Figure 2
Consensus clustering for the TNBC by combining eight datasets GSE18864, GSE58812, GSE76124, GSE76250, GSE83937, GSE95700, GSE106977, and TCGA. (A) PCA of the expression matrix of eight different datasets. (B) PCA of the levels of the immune cells of eight different datasets. (C) Consensus matrix heatmap plots when k = 2. (D) Five-year Kaplan–Meier curves for OS of TNBC patients stratified by the immune subtypes. p-value was calculated by the log-rank test among subtypes. TNBC, triple-negative breast cancer; PCA, principal component analysis; TCGA, The Cancer Genome Atlas; OS, overall survival.
Figure 3
Figure 3
The distribution of immune cell enrichment scores and immune-related markers in two different immune subtypes. (A) The immune cell enrichment scores in two subtypes are displayed by heatmap. (B) The immune-related markers in two different immune subtypes are displayed by the boxplot. The expression values of these markers in each dataset were transformed into “high” or “low” by the median value of the marker. Then, the correlation of immune subtypes (subtype 1 or subtype 2 groups) and expression groups (high or low groups) was tested by the Fisher’s test.
Figure 4
Figure 4
Identification of hub genes by RRA analysis and WGCNA. (A) Heatmap showing the top 100 upregulated genes or downregulated genes according to log2 fold change value. Each row represents one gene and each column indicates one dataset. Gold indicates upregulated genes and blue represents downregulated genes in subtype2. (B) Analysis of the scale-free fit index for various soft-thresholding powers (β). In all, 5 was the fittest power value. (C) The cluster dendrogram of TCGA-TNBC patients. Each branch in the figure represents one gene, and every color below represents one coexpression module. (D) PCC matrix between gene module and clinical characteristics. The PCC values range from −1 to 1 depending on the strength of the relationship. A positive value indicates that the genes within a particular coexpression module increase as the clinical trait increases. DEG, differentially expressed gene; RRA, robust rank aggregation; WGCNA, weighted gene coexpression network analysis; PCC, Pearson correlation coefficient.
Figure 5
Figure 5
The selection of the best parameter for the machine learning model. (A) Protein–protein interaction network of genes in the brown module. The color intensity and the size of nodes were positively correlated with the degree score. (B) The “mtry” with the highest AUC was selected as the optimal value of the random forest algorithm. (C) The “ntree” with the highest AUC was selected as the optimal value of the random forest algorithm. (D) Validation of model in the testing dataset. CR, complete response; PR, partial response; SD, stable disease; PD, progressive disease.
Figure 6
Figure 6
The optimal decision tree in the random forest model. The sample will be predicted into subtypes 1 or 2 by its gene expression.
Figure 7
Figure 7
The correlation of predicted immune subtype with the prognosis in the independent dataset (METABRIC dataset). (A) Five-year Kaplan–Meier curves for OS of TNBC patients stratified by the immune subtypes. (B) Five-year Kaplan–Meier curves for RFS of TNBC patients stratified by the immune subtypes. The p-values were calculated by the log-rank test among subtypes. METABRIC, Molecular Taxonomy of Breast Cancer International Consortium; OS, overall survival; RFS, relapse-free survival.
Figure 8
Figure 8
The correlation of predicted immune subtype with the immunotherapy efficacy in the independent datasets. (A–D) The correlation of predicted immune subtype with the response rate to immunotherapy in the independent datasets: (A) GSE35640, (B) GSE78220, (C) GSE91061, and (D) IMvigor210. (E) The correlation of predicted immune subtype with the survival analysis in the IMvigor210 dataset.
Figure 9
Figure 9
The flowchart of the shiny application.

Similar articles

Cited by

References

    1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. . Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin (2021) 71:209–49. doi: 10.3322/caac.21660 - DOI - PubMed
    1. Siegel RL, Miller KD, Jemal A. Cancer Statistics, 2018. CA Cancer J Clin (2018) 68:7–30. doi: 10.3322/caac.21442 - DOI - PubMed
    1. Chacon RD, Costanzo MV. Triple-Negative Breast Cancer. Breast Cancer Res (2010) 12 Suppl 2:S3. doi: 10.1186/bcr2574 - DOI - PMC - PubMed
    1. Johnson R, Sabnis N, McConathy WJ, Lacko AG. The Potential Role of Nanotechnology in Therapeutic Approaches for Triple Negative Breast Cancer. PHARMACEUTICS (2013) 5:353–70. doi: 10.3390/pharmaceutics5020353 - DOI - PMC - PubMed
    1. Bansal N, Bosch A, Leibovitch B, Pereira L, Cubedo E, Yu J, et al. . Blocking the PAH2 Domain of Sin3A Inhibits Tumorigenesis and Confers Retinoid Sensitivity in Triple Negative Breast Cancer. Oncotarget (2016) 7:43689–702. doi: 10.18632/oncotarget.9905 - DOI - PMC - PubMed

Publication types