Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 19;10(1):60.
doi: 10.1038/s41523-024-00671-1.

Gene expression signature for predicting homologous recombination deficiency in triple-negative breast cancer

Affiliations

Gene expression signature for predicting homologous recombination deficiency in triple-negative breast cancer

Jia-Wern Pan et al. NPJ Breast Cancer. .

Abstract

Triple-negative breast cancers (TNBCs) are a subset of breast cancers that have remained difficult to treat. A proportion of TNBCs arising in non-carriers of BRCA pathogenic variants have genomic features that are similar to BRCA carriers and may also benefit from PARP inhibitor treatment. Using genomic data from 129 TNBC samples from the Malaysian Breast Cancer (MyBrCa) cohort, we developed a gene expression-based machine learning classifier for homologous recombination deficiency (HRD) in TNBCs. The classifier identified samples with HRD mutational signature at an AUROC of 0.93 in MyBrCa validation datasets and 0.84 in TCGA TNBCs. Additionally, the classifier strongly segregated HRD-associated genomic features in TNBCs from TCGA, METABRIC, and ICGC. Thus, our gene expression classifier may identify triple-negative breast cancer patients with homologous recombination deficiency, suggesting an alternative method to identify individuals who may benefit from treatment with PARP inhibitors or platinum chemotherapy.

PubMed Disclaimer

Conflict of interest statement

The authors declare that this research was funded by Cancer Research Malaysia, which also holds a patent pending related to the gene expression classifier described in this study. J.W.P., Z.C.T., P.S.N., M.M.A.Z., P.N.F., J.Y.T., S.N.H., J.L., and S.H.T. are current or former employees of Cancer Research Malaysia.

Figures

Fig. 1
Fig. 1. Clustering and gene expression analyses for homologous recombination deficiency (HRD) in MyBrCa TNBC samples.
A Unsupervised hierarchical clustering of 129 MyBrCa TNBC samples using HRD-associated features including the COSMIC single base substitution mutational signature 3 (SBS3), short insertions and deletions (indels), loss-of-heterozygosity (LOH), telomeric allelic imbalance (TAI), large-scale transitions (LST), as well as copy number amplifications, deletions, gain and loss. All scores were scaled using z-scores, and the indels score was also log-transformed prior to scaling. Designations for each sample as HRD High or HRD Low are indicated by the “HRD prediction” annotation bar. B Volcano plot for differential expression analysis comparing HRD High and HRD Low samples. Dotted lines indicate the thresholds used to classify genes as differentially expressed (Benjamini-Hochberg adjusted p-value < 0.001, absolute log2 fold change > 2).
Fig. 2
Fig. 2. Performance of the HRD200 classifier in the MyBrCa TNBC cohort.
A Receiver operating characteristic (ROC) curves of false positive rate (FPR) and true positive rate (TPR) showing the performance of the HRD200 composite classifier in predicting HRD High status in 70:30 training:testing gene expression datasets from 113 MyBrCa TNBC samples. The HRD200 classifier was trained on gene expression data of 217 differentially expressed genes. The ROC curves for each of the five shuffled 70:30 datasets are shown separately. B Bar chart showing the probability of a sample being HRD High according to the HRD200 classifier, compared to their HRD classification by consensus clustering (color of the bar) and known germline BRCA status (“gBRCAm” annotation).
Fig. 3
Fig. 3. Validation of the HRD200 classifier in the TCGA cohort.
A Comparisons of normalized scores for telomeric allelic imbalance (TAI), large-scale transitions (LST), loss-of-heterozygosity (LOH), COSMIC single base substitution mutational signature 3 (SBS3), copy number amplifications, deletions, gain, and loss, and short insertions and deletions (indels) between TCGA TNBC samples classified by the HRD200 classifier as HRD Low (n = 31, in blue) and samples classified as HRD High (n = 56, in red). All scores were scaled using z-scores, and the indels score was also log-transformed prior to scaling. Box and whiskers plots were constructed with boxes indicating 25th percentile, median (centre line) and 75th percentile, and whiskers showing the maximum and minimum values within 1.5 times the inter-quartile range from the edge of the box, with outliers shown. B Receiver operating characteristic (ROC) curves of false positive rate (FPR) and true positive rate (TPR) showing the performance of the HRD200 composite classifier in predicting “ground truth” HRD High status in TCGA TNBC samples that was determined by consensus hierarchical and k-means clustering of the variables included in (A). The ROC curves for each of the five component model sets are shown separately.
Fig. 4
Fig. 4. Validation of the HRD classifier on the NanoString nCounter platform.
A Receiver operating characteristic (ROC) curves showing the performance of HRD200 classifier (retrained using a 36 gene subset) in predicting HRD High status from NanoString nCounter gene expression data from fresh frozen (left, n = 55) and FFPE (right, n = 19) samples from the MyBrCa TNBC cohort, using our consensus unsupervised clustering results as the ground truth. The ROC curves for each of the five component model sets are shown separately. B Comparison of the HRD High probabilities given by the HRD200 classifier for RNAseq (y-axis) and NanoString (x-axis) fresh frozen (left) or FFPE (right) matched samples. Also show are Spearman’s correlation coefficient (ρ) for each comparison. C Confusion matrices comparing HRD200 classification of samples using NanoString nCounter data from fresh frozen (left) or FFPE (right) samples to “ground truth” HRD status by consensus unsupervised clustering.

References

    1. Yin, L., Duan, J. J., Bian, X. W. & Yu, S. C. Triple-negative breast cancer molecular subtyping and treatment progress. Breast Cancer Res.22, 1–13 (2020).10.1186/s13058-020-01296-5 - DOI - PMC - PubMed
    1. Lebert, J. M., Lester, R., Powell, E., Seal, M. & McCarthy, J. Advances in the systemic treatment of triple-negative breast cancer. Curr. Oncol.25, 142–150 (2018).10.3747/co.25.3954 - DOI - PMC - PubMed
    1. Dawson, S. J., Provenzano, E. & Caldas, C. Triple negative breast cancers: clinical and prognostic implications. Eur. J. Cancer45, 27–40 (2009). 10.1016/S0959-8049(09)70013-9 - DOI - PubMed
    1. Lehmann, B. D. et al. Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J. Clin. Invest.121, 2750–2767 (2011). 10.1172/JCI45014 - DOI - PMC - PubMed
    1. Burstein, M. D. et al. Comprehensive genomic analysis identifies novel subtypes and targets of triple-negative breast cancer. Clin. Cancer Res.21, 1688–1698 (2015). 10.1158/1078-0432.CCR-14-0432 - DOI - PMC - PubMed

LinkOut - more resources