Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 30;12(6):1466-1489.
doi: 10.21037/tcr-23-3. Epub 2023 Jun 20.

Screening of novel biomarkers for breast cancer based on WGCNA and multiple machine learning algorithms

Affiliations

Screening of novel biomarkers for breast cancer based on WGCNA and multiple machine learning algorithms

Xiaohu Jin et al. Transl Cancer Res. .

Abstract

Background: Breast cancer (BC) ranks first in incidence among women, with approximately 2 million new cases per year. Therefore, it is essential to investigate emerging targets for BC patients' diagnosis and prognosis.

Methods: We analyzed gene expression data from 99 normal and 1,081 BC tissues in The Cancer Genome Atlas (TCGA) database. Differentially expressed genes (DEGs) were identified using "limma" R package, and relevant modules were chosen through Weighted Gene Coexpression Network Analysis (WGCNA). Intersection genes were obtained by matching DEGs to WGCNA module genes. Functional enrichment studies were performed on these genes using Gene Ontology (GO), Disease Ontology (DO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Biomarkers were screened via Protein-Protein Interaction (PPI) networks and multiple machine-learning algorithms. The Gene Expression Profiling Interactive Analysis (GEPIA), The University of ALabama at Birmingham CANcer (UALCAN), and Human Protein Atlas (HPA) databases were employed to examine mRNA and protein expression of eight biomarkers. Kaplan-Meier mapper tool assessed their prognostic capabilities. Key biomarkers were analyzed via single-cell sequencing, and their relationship with immune infiltration was examined using Tumor Immune Estimation Resource (TIMER) database and "xCell" R package. Lastly, drug prediction was conducted based on the identified biomarkers.

Results: We identified 1,673 DEGs and 542 important genes through differential analysis and WGCNA, respectively. Intersection analysis revealed 76 genes, which play significant roles in immune-related viral infection and IL-17 signaling pathways. DIX domain containing 1 (DIXDC1), Dual specificity phosphatase 6 (DUSP6), Pyruvate dehydrogenase kinase 4 (PDK4), C-X-C motif chemokine ligand 12 (CXCL12), Interferon regulatory factor 7 (IRF7), Integrin subunit alpha 7 (ITGA7), NIMA related kinase 2 (NEK2), and Nuclear receptor subfamily 3 group C member 1 (NR3C1) were selected as BC biomarkers using machine-learning algorithms. NEK2 was the most critical gene for diagnosis. Prospective drugs targeting NEK2 include etoposide and lukasunone.

Conclusions: Our study identified DIXDC1, DUSP6, PDK4, CXCL12, IRF7, ITGA7, NEK2, and NR3C1 as potential diagnostic biomarkers for BC, with NEK2 having the highest potential to aid in diagnosis and prognosis in clinical settings.

Keywords: Machine learning; NEK2; biomarkers; breast cancer (BC); weighted correlation network analysis.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://tcr.amegroups.com/article/view/10.21037/tcr-23-3/coif). The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Flow chart of the research process. TCGA, The Cancer Genome Atlas; WGCNA, Weighted Gene Coexpression Network Analysis; GSEA, gene set enrichment analysis; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; LASSO, Least Absolute Shrinkage and Selection Operator; SVM-REF, Support Vector Machine-Recursive Elimination Feature; GEPIA, The Gene Expression Profiling Interactive Analysis; HPA, Human Protein Atlas; UALCAN, The University of ALabama at Birmingham CANcer; ROC, receiver operating characteristic; NEK2, NIMA related kinase 2; DIXDC1, DIX domain containing 1; CXCL12, C-X-C motif chemokine ligand 12; TIMER, Tumor Immune Estimation Resource.
Figure 2
Figure 2
Identification of DEGs in patients with breast cancer. (A) Volcano plot presenting the expression characteristics of DEGs, where blue represents gene upregulation in normal tissues, and red represents gene upregulation in cancerous tissues. (B) Heatmap presenting the expression of the sample’s top 30 DEGs. (C) GSEA functional analysis of DEGs. AMPK, adenosine 5‘-monophosphate (AMP)-activated protein kinase; cGMP-PKG, cyclic guanosine monophosphate-protein kinase G; JAK-STAT, Janus kinase-signal transducer and activator of transcription; PPAR, peroxisome proliferator-activated receptor; NES, normalized enrichment score; DEGs, differentially expressed genes; GSEA, gene set enrichment analysis; FC, fold change.
Figure 3
Figure 3
BC-related hub module recognition. (A) Left: scale-free fit index; right: mean connectivity. (B) The cluster dendrogram of co-expression genes in BC. (C) Correlations between module features. The MEs are represented by the rows in the heat map, while the clinical features are represented by the columns. The corresponding correlation coefficients and P values are contained in each individual cell. (D) DEGs mapped to WGCNA module genes. WGCNA, Weighted Gene Coexpression Network Analysis; DIFF, differential; BC, breast cancer; MEs, module signature genes; DEGs, differentially expressed genes.
Figure 4
Figure 4
A comprehensive functional analysis of prospective genes. (A) Potential roles in various BPs, CCs, and MFs based on GO analysis. (B) DO. (C) KEGG pathways. (D) TNF signaling pathway. BPs, biological processes; CCs, cellular components; MFs, molecular functions; CXCR, C-X-C chemokine receptor; COVID-19, coronavirus disease 2019; GO, Gene Ontology; DO, Disease Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; TNF, tumor necrosis factor.
Figure 5
Figure 5
Interaction network of prospective targets.
Figure 6
Figure 6
Biomarker screening based on machine learning algorithms. (A) The top 20 most important genes and the RANDOM forest model. (B) LASSO regression model. (C) SVM-REF analysis with the lowest error rate when there were 23 signature genes. (D) Biomarkers. CV, coefficient of variation; LASSO, Least Absolute Shrinkage and Selection Operator; SVM-REF, Support Vector Machine-Recursive Elimination Feature.
Figure 7
Figure 7
Expression of the eight genes in BC (UALCAN Analysis and HPA Analysis). (A) The graph generated from the GEPIA database was used to compare the expression of the eight biomarker genes in BC tissues (n=1,085) and normal breast tissues (n=291), *, P<0.05. (B) Representative immunohistochemical images of the eight biomarker genes in BC tissues and normal breast tissues based on HPA datasets (200 µm). NEK2: https://www.proteinatlas.org/ENSG00000117650-NEK2/tissue/breast#; https://www.proteinatlas.org/ENSG00000117650-NEK2/pathology/breast+cancer#; NR3C1: https://www.proteinatlas.org/ENSG00000113580-NR3C1/tissue; https://www.proteinatlas.org/ENSG00000113580-NR3C1/pathology/breast+cancer#; PDK4: https://www.proteinatlas.org/ENSG00000004799-PDK4/tissue/breast#img; https://www.proteinatlas.org/ENSG00000004799-PDK4/pathology/breast+cancer#; ITGA7: https://www.proteinatlas.org/ENSG00000135424-ITGA7/tissue/breast#; https://www.proteinatlas.org/ENSG00000135424-ITGA7/pathology/breast+cancer#; DIXDC1: https://www.proteinatlas.org/ENSG00000150764-DIXDC1/tissue/breast#; https://www.proteinatlas.org/ENSG00000150764-DIXDC1/pathology/breast+cancer#; CXCL12: https://www.proteinatlas.org/ENSG00000107562-CXCL12/tissue/breast#; https://www.proteinatlas.org/ENSG00000107562-CXCL12/pathology/breast+cancer#; DUSP6: https://www.proteinatlas.org/ENSG00000139318-DUSP6/tissue/breast#; https://www.proteinatlas.org/ENSG00000139318-DUSP6/pathology/breast+cancer#img; IRF7: https://www.proteinatlas.org/ENSG00000185507-IRF7/tissue/breast#; https://www.proteinatlas.org/ENSG00000185507-IRF7/pathology/breast+cancer#. NEK2, NIMA related kinase 2; NR3C1, nuclear receptor subfamily 3 group C member 1; PDK4, pyruvate dehydrogenase kinase 4; ITGA7, integrin subunit alpha 7; DIXDC1, DIX domain containing 1; CXCL12, C-X-C motif chemokine ligand 12; DUSP6, dual specificity phosphatase 6; IRF7, interferon regulatory factor 7; T, tumor; N, normal; BC, breast cancer; UALCAN, The University of ALabama at Birmingham CANcer; HPA, Human Protein Atlas; GEPIA, The Gene Expression Profiling Interactive Analysis.
Figure 8
Figure 8
Correlation of prospective biomarkers and analysis of their prognostic ability (Kaplan-Meier Plotter Analysis). (A) Correlation analysis between prospective targets. *, P<0.05. (B) The prognostic significances of prospective targets in BC patients assessed by RFS, OS, DMFS, and PPS. DIXDC1, DIX domain containing 1; DUSP6, dual specificity phosphatase 6; PDK4, pyruvate dehydrogenase kinase 4; CXCL12, C-X-C motif chemokine ligand 12; IRF7, interferon regulatory factor 7; ITGA7, integrin subunit alpha 7; NEK2, NIMA related kinase 2; NR3C1, nuclear receptor subfamily 3 group C member 1; RFS, recurrence-free survival; OS, overall survival; DMFS, distant metastasis-free survival; PPS, post-progression survival; HR, hazard ratio; CI, confidence interval.
Figure 9
Figure 9
Expression transcript levels of NEK2, DIXDC1 and CXCL12 with molecular subtypes and tumor stage of BRCA (UALCAN Analysis) and GSEA. (A) Association of NEK2, DIXDC1 and CXCL12 expression with molecular subtypes of BRCA. (B) Association of NEK2, DIXDC1 and CXCL12 expression with tumor stage of BRCA. (C) GSEA functional analysis of NEK2, DIXDC1 and CXCL12. **, P<0.01. NEK2, NIMA related kinase 2; BRCA, breast cancer; TCGA, The Cancer Genome Atlas; DIXDC1, DIX domain containing 1; CXCL12, C-X-C motif chemokine ligand 12; ECM, extracellular matrix; UALCAN, The University of ALabama at Birmingham CANcer; GSEA, gene set enrichment analysis.
Figure 10
Figure 10
Association of NEK2, DIXDC1 and CXCL12 expression in single cell sequencing with tumor functional status. (A) The heatmap generated from the CancerSEA database displays the correlation between NEK2, DIXDC1, and CXCL12 expression levels and the functional status of different tumor types. The visualization shows the extent to which these genes are associated with different functional states of tumors. (B) By analyzing the CancerSEA database, a statistically significant correlation (***, P≤0.001) between NEK2 expression levels in BC and three distinct functional states was identified. (C) t-SNE diagram showed NEK2, DIXDC1 and CXCL12 expression profiles were in single cells of BC samples, respectively. NEK2, NIMA related kinase 2; DIXDC1, DIX domain containing 1; CXCL12, C-X-C motif chemokine ligand 12; BC, breast cancer; t-SNE, t-Distributed Stochastic Neighbor Embedding.
Figure 11
Figure 11
Correlation of NEK2 with immune cell infiltration in breast malignancy was analyzed and displayed. (A) Correlation between NEK2 gene expression and immune invasion of BRCA. (B) Effects of high and low expression of NEK2 on 64 types of immune cells based on R package xCell. *, P<0.05; **, P<0.01; ***, P<0.001; ns, no significance. BRCA, breast cancer; NEK2, NIMA related kinase 2; TPM, transcripts per kilobase of exon model per million mapped reads.
Figure 12
Figure 12
Drug prediction results.
Figure 13
Figure 13
Molecular docking results.

Similar articles

Cited by

References

    1. Siegel RL, Miller KD, Fuchs HE, et al. Cancer Statistics, 2021. CA Cancer J Clin 2021;71:7-33. 10.3322/caac.21654 - DOI - PubMed
    1. Zhu C, Xu J, Sun J, et al. Circulating Tumor Cells and Breast Cancer Metastasis: From Enumeration to Somatic Mutational Profile. J Clin Med 2022;11:6067. 10.3390/jcm11206067 - DOI - PMC - PubMed
    1. Xiao Q, Cheng Z, Kuang W, et al. Clinical Value of PPM1G Gene in Survival Prognosis and Immune Infiltration of Hepatocellular Carcinoma. Appl Bionics Biomech 2022;2022:8926221. 10.1155/2022/8926221 - DOI - PMC - PubMed
    1. Zhu YX, Huang JQ, Ming YY, et al. Screening of key biomarkers of tendinopathy based on bioinformatics and machine learning algorithms. PLoS One 2021;16:e0259475. 10.1371/journal.pone.0259475 - DOI - PMC - PubMed
    1. Meng XW, Cheng ZL, Lu ZY, et al. MX2: Identification and systematic mechanistic analysis of a novel immune-related biomarker for systemic lupus erythematosus. Front Immunol 2022;13:978851. 10.3389/fimmu.2022.978851 - DOI - PMC - PubMed