Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun 17;8(35):58809-58822.
doi: 10.18632/oncotarget.18544. eCollection 2017 Aug 29.

A pathways-based prediction model for classifying breast cancer subtypes

Affiliations

A pathways-based prediction model for classifying breast cancer subtypes

Tong Wu et al. Oncotarget. .

Abstract

Breast cancer is highly heterogeneous and is classified into four subtypes characterized by specific biological traits, treatment responses, and clinical prognoses. We performed a systemic analysis of 698 breast cancer patient samples from The Cancer Genome Atlas project database. We identified 136 breast cancer genes differentially expressed among the four subtypes. Based on unsupervised clustering analysis, these 136 core genes efficiently categorized breast cancer patients into the appropriate subtypes. Functional enrichment based on Kyoto Encyclopedia of Genes and Genomes analysis identified six functional pathways regulated by these genes: JAK-STAT signaling, basal cell carcinoma, inflammatory mediator regulation of TRP channels, non-small cell lung cancer, glutamatergic synapse, and amyotrophic lateral sclerosis. Three support vector machine (SVM) classification models based on the identified pathways effectively classified different breast cancer subtypes, suggesting that breast cancer subtype-specific risk assessment based on disease pathways could be a potentially valuable approach. Our analysis not only provides insight into breast cancer subtype-specific mechanisms, but also may improve the accuracy of SVM classification models.

Keywords: breast cancer; classification prediction model; co-expression network; pathway enrichment; subtype-specific gene.

PubMed Disclaimer

Conflict of interest statement

CONFLICTS OF INTEREST The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1. Venn diagram showing overlapping and unique subtype-specific genes
Green: LA breast cancer subtype; blue: LB; orange: HER2+; red: TN. 136 genes overlapped between the four specific gene sets. LA, luminal A; LB, luminal B; HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 2
Figure 2. Heat map matrices showing co-expression correlations between 136 overlapping genes for samples of all four subtypes
Red: positive correlation; blue: negative correlation. Results indicated that correlations among these 136 genes in each subtype were not identical. HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 3
Figure 3. Correlation pairs according to the correlation coefficient
Horizontal axis represents the correlation coefficient; vertical axis represents the number of correlated gene pairs after logarithmic conversion. Blue: LA breast cancer subtype; red: LB; green: HER2+; purple: TN. Overall, the number of gene pairs gradually decreased with an increasing R value. LA, luminal A; LB, luminal B; HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 4
Figure 4. Comparison between subtypes when R ≥ 0.5
Green: LA breast cancer subtype; blue: LB; orange: HER2+; red: TN. Horizontal axis represents the correlation coefficient; vertical axis represents the density distribution. Differences were observed between the density distributions of the correlation coefficients for the four subtypes. LA, luminal A; LB, luminal B; HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 5
Figure 5. Topological characteristics in the four subtype co-expression networks
Degree distribution (A), average shortest path length (B), closeness centrality (C), and topological coefficient (D) of the four subtypes. Green: LA breast cancer subtype; blue: LB; orange: HER2+; red: TN. LA, luminal A; LB, luminal B; HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 6
Figure 6. Unsupervised clustering analysis using overlapping and unifying genes
The results of unsupervised clustering for samples of all four subtypes using 136 overlapping genes (A) and unifying genes (B) Red: high expression genes; blue: low expression.
Figure 7
Figure 7. Alteration score distributions for six functional pathways in the four breast cancer subtypes
Diffused points show the distribution of alteration scores of the functional pathways. The six functional pathways included glutamatergic synapse, basal cell carcinoma, non-small cell lung cancer, JAK-STAT signaling pathway, inflammatory mediator regulation of TRP channels, and amyotrophic lateral sclerosis. Red: LA breast cancer subtype; green: LB; blue: HER2+; purple: TN. LA, luminal A; LB, luminal B; HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 8
Figure 8. Boxplots showing alteration score distributions for the six functional pathways in the four breast cancer subtypes
Horizontal axis represents the subtype samples; vertical axis represents the alteration score; black horizontal line represents the median. Red: LA breast cancer subtype; green: LB; blue: HER2+; purple: TN. LA, luminal A; LB, luminal B; HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 9
Figure 9. Functional variation trends in the six pathways are shown, using the sliding window approach and LOESS fitting algorithm
Red: LA breast cancer subtype; green: LB; blue: HER2+; purple: TN; dark blue: fitting line. LA, luminal A; LB, luminal B; HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 10
Figure 10. Confusion matrix for the four subtypes
Horizontal axis represents the predicted result; vertical axis represents the actual result. Darker color: higher precision; lighter color: lower precision. HER2+, human epidermal growth factor receptor 2 positive; TN, triple negative.
Figure 11
Figure 11. ROC curve showing the performance of the luminal A/B and TN subtype prediction models
Horizontal axis represents the ROC curve specificity; vertical axis represents the sensitivity. The average area under the curve (AUC) of the LA and LB was 0.78, but the classification efficiencies for TN and luminal A/B were higher than 91%. Red: AUC for LA and LB; blue: AUC for LA and TN; green: AUC for LB and TN. LA, luminal A; LB, luminal B; TN, triple negative.

References

    1. Koboldt DC, Fulton RS, McLellan MD, Schmidt H, Kalicki-Veizer J, McMichael JF, Fulton LL, Dooling DJ, Ding L, Mardis ER, Wilson RK, Ally A, Balasundaram M, et al. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70. - PMC - PubMed
    1. Zhu X, Ying J, Wang F, Wang J, Yang H. Estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 status in invasive breast cancer: a 3,198 cases study at National Cancer Center, China. Breast Cancer Res Treat. 2014;147:551–555. - PubMed
    1. Prat A, Pineda E, Adamo B, Galván P, Fernández A, Gaba L, Díez M, Viladot M, Arance A, Muñoz M. Clinical implications of the intrinsic molecular subtypes of breast cancer. Breast. 2015;24:S26–35. - PubMed
    1. Dai X, Li T, Bai Z, Yang Y, Liu X, Zhan J, Shi B. Breast cancer intrinsic subtype classification, clinical use and future trends. Am J Cancer Res. 2015;5:2929–43. - PMC - PubMed
    1. Chia SK, Speers CH, D’yachkova Y, Kang A, Malfair-Taylor S, Barnett J, Coldman A, Gelmon KA, O’reilly SE, Olivotto IA. The impact of new chemotherapeutic and hormone agents on survival in a population-based cohort of women with metastatic breast cancer. Cancer. 2007;110:973–9. - PubMed