Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 4:21:2940-2949.
doi: 10.1016/j.csbj.2023.05.002. eCollection 2023.

Development and validation of a prognostic 15-gene signature for stratifying HER2+/ER+ breast cancer

Affiliations

Development and validation of a prognostic 15-gene signature for stratifying HER2+/ER+ breast cancer

Qian Liu et al. Comput Struct Biotechnol J. .

Abstract

Background: Human epidermal growth receptor 2-positive (HER2+) breast cancer (BC) is a heterogeneous subgroup. Estrogen receptor (ER) status is emerging as a predictive marker within HER2+ BCs, with the HER2+/ER+ cases usually having better survival in the first 5 years after diagnosis but have higher recurrence risk after 5 years compared to HER2+/ER-. This is possibly because sustained ER signaling in HER2+ BCs helps escape the HER2 blockade. Currently HER2+/ER+ BC is understudied and lacks biomarkers. Thus, a better understanding of the underlying molecular diversity is important to find new therapy targets for HER2+/ER+ BCs.

Methods: In this study, we performed unsupervised consensus clustering together with genome-wide Cox regression analyses on the gene expression data of 123 HER2+/ER+ BC from The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) cohort to identify distinct HER2+/ER+ subgroups. A supervised eXtreme Gradient Boosting (XGBoost) classifier was then built in TCGA using the identified subgroups and validated in another two independent datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO) (accession number GSE149283)). Computational characterization analyses were also performed on the predicted subgroups in different HER2+/ER+ BC cohorts.

Results: We identified two distinct HER2+/ER+ subgroups with different survival outcomes using the expression profiles of 549 survival-associated genes from the Cox regression analyses. Genome-wide gene expression differential analyses found 197 differentially expressed genes between the two identified subgroups, with 15 genes overlapping the 549 survival-associated genes.XGBoost classifier, using the expression values of the 15 genes, achieved a strong cross-validated performance (Area under the curve (AUC) = 0.85, Sensitivity = 0.76, specificity = 0.77) in predicting the subgroup labels. Further investigation partially confirmed the differences in survival, drug response, tumor-infiltrating lymphocytes, published gene signatures, and CRISPR-Cas9 knockout screened gene dependency scores between the two identified subgroups.

Conclusion: This is the first study to stratify HER2+/ER+ tumors. Overall, the initial results from different cohorts showed there exist two distinct subgroups in HER2+/ER+ tumors, which can be distinguished by a 15-gene signature. Our findings could potentially guide the development of future precision therapies targeted on HER2+/ER+ BC.

Keywords: Breast cancer; Consensus clustering; Gene signature; Machine learning; Subtyping.

PubMed Disclaimer

Conflict of interest statement

There is no conflict of interest.

Figures

ga1
Graphical abstract
Fig. 1
Fig. 1
Overall workflow of this study. 15,850 genes are in common among TCGA-BRCA, METABRIC, and GSE149283 HER2+/ER+ patients. Of them 12,236 genes with at least one count in one sample are kept and input into a Cox regression-based feature selection step, which results in 549 significant genes based on the criteria of p-value< 0.01. Consensus clustering are then performed to stratify TCGA-BRCA HER2+/ER+ patients based on gene expression profile of these 549 significant genes. Gene differential analysis is done among the identified subtypes to identify most differentially expressed genes. Genes that are significant in both Cox regression analysis and gene expression differential analysis are selected to form the proposed gene signature. Validation of this gene signature is performed on METABRIC and GSE149283 HER2+/ER+ cohorts. A XGBoost classifier is trained using the proposed gene signature on TCGA-BRCA data, and then applied to assign METABRIC and GSE149283 BCs into two subgroups. For METABRIC, survival difference of the predicted subgroups is tested. For GSE149283, the drug response difference between the predicted subgroups is tested.
Fig. 2
Fig. 2
Results of consensus clustering on TCGA-BRCA data. A: Symmetric consensus matrix hierarchical clustering heatmap for TCGA-BRCA data. Columns and rows are patients. The color represents the probability that two patients were clustered together. B: Silhouette plot for the TCGA-BRCA data. Each horizontal line represents a sample, and the length of the line is the silhouette value for the sample. The color represents different subtypes: red ones are in Subgroup 1, while green ones are in Subgroup 2. A high value indicates that the sample is well matched to its own cluster and poorly matched to other clusters. If most samples have a high positive value, then the clustering configuration is appropriate. The overall silhouette value is 0.91, which means the clustering is appropriate. C: KM plot of two subgroups identified by CC.
Fig. 3
Fig. 3
The SHAP importance score of each gene in the XGBoost classifier.
Fig. 4
Fig. 4
Predicted subgroups of external validation HER2+/ER+ BC cohorts. A: The expression profile of the proposed 15-gene signature on METABRIC HER2+/ER+ BC cohort. Columns are 104 patients, while rows are 15 genes. The XGBoost predicted subgroup labels are shown in the top side bar. B: The expression profile of the proposed 15-gene signature on GSE149283 HER2+/ER+ BC cohort. Columns are 14 patients, and rows are 15 genes. The XGBoost predicted subgroup labels are shown in the top side bar. C: KM plot of the two subgroups of METABRIC cohort predicted by XGboost. D: The stacked histogram of the trastuzumab therapy response for the XGBoost predicted subgroups. PCR, pathological complete response; PPR, pathological partial response; OR, odds ratio.
Fig. 5
Fig. 5
Computational characterization of the HER2+/ER+ subgroups for both TCGA-BRCA cohort and METABRIC cohort. A top panel: The common genes that are mutated in both TCGA-BRCA Subgroup 1 and METABRIC Subgroup 1. A bottom panel: The common genes that are altered in both TCGA-BRCA Subgroup 2 and METABRIC Subgroup 2. B: The TIMER quantified abundances of tumor-infiltrating lymphocytes for both TCGA-BRCA and METABRIC cohorts. T-test were used to test the significance of the differences. C: Histograms of the PAM50 intrinsic subtypes distributions for two subgroups. D: Density plots of the published gene signatures (rorS, GENIUS, GENE70, GGI) of different subgroups in both TCGA-BRCA and METABRIC cohorts.

References

    1. American Cancer Society. Cancer Facts & Figures 2022. Atlanta: American Cancer Society. 2022.
    1. Dai X., Xiang L., Li T., Bai Z. Cancer hallmarks biomarkers and breast cancer molecular subtypes. J Cancer. 2016;7:1281–1294. - PMC - PubMed
    1. Rye I.H., Trinh A., Sætersdal A.B., Nebdal D., Lingjærde O.C., Almendro V., et al. Intratumor heterogeneity defines treatment-resistant HER2+ breast tumors. Mol Oncol. 2018;12:1838–1855. - PMC - PubMed
    1. Brandão M., Caparica R., Malorni L., Prat A., Carey L.A., Piccart M. What Is the real impact of estrogen receptor status on the prognosis and treatment of HER2-positive early breast cancer? Clin Cancer Res. 2020;26:2783–2788. - PMC - PubMed
    1. Gingras I., Gebhart G., De Azambuja E., Piccart-Gebhart M. HER2-positive breast cancer is lost in translation: time for patient-centered research. Nat Rev Clin Oncol. 2017;14:669–681. - PubMed