. 2022 Jun 23:13:905523.

doi: 10.3389/fphys.2022.905523. eCollection 2022.

ESRRG, ATP4A, and ATP4B as Diagnostic Biomarkers for Gastric Cancer: A Bioinformatic Analysis Based on Machine Learning

Qiu Chen¹, Yu Wang², Yongjun Liu², Bin Xi²

Affiliations

¹ Medical College, Yangzhou University, Yangzhou, China.
² College of Physics Science and Technology, Yangzhou University, Yangzhou, China.

PMID: 35812327
PMCID: PMC9262247
DOI: 10.3389/fphys.2022.905523

ESRRG, ATP4A, and ATP4B as Diagnostic Biomarkers for Gastric Cancer: A Bioinformatic Analysis Based on Machine Learning

Qiu Chen et al. Front Physiol. 2022.

. 2022 Jun 23:13:905523.

doi: 10.3389/fphys.2022.905523. eCollection 2022.

Authors

Qiu Chen¹, Yu Wang², Yongjun Liu², Bin Xi²

Affiliations

¹ Medical College, Yangzhou University, Yangzhou, China.
² College of Physics Science and Technology, Yangzhou University, Yangzhou, China.

PMID: 35812327
PMCID: PMC9262247
DOI: 10.3389/fphys.2022.905523

Abstract

Based on multiple bioinformatics methods and machine learning techniques, this study was designed to explore potential hub genes of gastric cancer with a diagnostic value. The novel biomarkers were detected through multiple databases of gastric cancer-related genes. The NCBI Gene Expression Omnibus (GEO) database was used to obtain gene expression files. Three hub genes (ESRRG, ATP4A, and ATP4B) were detected through a combination of weighted gene co-expression network analysis (WGCNA), gene-gene interaction network analysis, and supervised feature selection method. GEPIA2 was used to verify the differences in the expression levels of the hub genes in normal and cancer tissues in the RNA-seq levels of Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) databases. The objectivity of potential hub genes was also verified by immunohistochemistry in the Human Protein Atlas (HPA) database and transcription factor-hub gene regulatory network. Machine learning (ML) methods including data pre-processing, model selection and cross-validation, and performance evaluation were examined on the hub-gene expression profiles in five Gene Expression Omnibus datasets and verified on a GEO external validation (EV) dataset. Six supervised learning models (support vector machine, random forest, k-nearest neighbors, neural network, decision tree, and eXtreme Gradient Boosting) and one semi-supervised learning model (label spreading) were established to evaluate the diagnostic value of biomarkers. Among the six supervised models, the support vector machine (SVM) algorithm was the most effective one according to calculated performance metrics, including 0.93 and 0.99 area under the curve (AUC) scores on the test and external validation datasets, respectively. Furthermore, the semi-supervised model could also successfully learn and predict sample types, achieving a 0.986 AUC score on the EV dataset, even when 10% samples in the five GEO datasets were labeled. In conclusion, three hub genes (ATP4A, ATP4B, and ESRRG) closely related to gastric cancer were mined, based on which the ML diagnostic model of gastric cancer was conducted.

Keywords: WGCNA; bioinformatics; diagnostic model; gastric cancer; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 2**
Progress of the weighted gene co-expression network analysis in GSE66229. **(A)** Cluster dendrogram of 161 samples in GSE66229. **(B)** Soft thresholds of the best scale-free topological model fitting index (left) and mean connectivity (right) were determined. The red horizontal line represents R ² = 0.86. **(C)** Dendrogram of all genes clustered in GSE66229. Gene clustering into modules is based on a topological overlap matrix. Assigned modules are colored on the bottom with gray denoting unassigned genes.

**FIGURE 3**
Heatmap of the relationship between module eigengenes and clinical traits of GSE66229. WGCNA labeled heatmaps for GSE66229, each row represents a module characteristic gene encoded by color, and the three columns represent clinical characteristics of overall survival time (OST), overall survival status (OSS), and sample type, respectively. Each cell represents the Pearson correlation coefficient and p-value (in parentheses) of the corresponding module characteristics, and the color of each cell represents the value of correlation.

**FIGURE 4**
Gene–gene interaction network of the top-ranked 10% genes in red modules.

**FIGURE 5**
Validation of three hub gene expressions in the GEPIA2 platform. **(A)** Validation of three hub gene expressions in the GEPIA2 platform. The red and gray boxes represent cancer and normal tissues in the TCGA and GTEx datasets, respectively. STAD, gastic cancer, and p < 0.01 (GEPIA2 website). **(B)** Immunohistochemical staining of ESRRG, ATP4A, and ATP4B in the Human Protein Atlas (HPA) database. **(C)** Transcription factor–hub gene regulatory network of the most relevant factor in the Cytoscape plugin “iRegulon”.

**FIGURE 6**
Performance of the six supervised machine learning models on the test and EV sets. Hyperparameters of all six models are tuned with the GridSearchCV method, according to the “MCC” metric, and then, the six best models were chosen after exploration of the whole grid. Predictions on the test and EV sets are made with the best models. Six models used in this study are support vector machine (SVM), k-nearest neighbors (KNN), decision tree (DT), random forest (RF), neural network (NN), and eXtreme Gradient Boosting (XGB) in order. **(A,B)** Scores of accuracy, F1 score, MCC, precision, sensitivity, and specificity in the six models on the test and valid datasets, respectively. **(C,D)** Four terms of the confusion matrix (TP, TN, FP, and FN) in the six models on the test and valid datasets, respectively.

**FIGURE 7**
ROC curves for the predicted probability on the test and EV sets of all six machine learning diagnostic models: **(A)** SVM, **(B)** RF, **(C)** KNN, **(D)** NN, **(E)** DT and **(F)** XGB.

**FIGURE 8**
Performance of the semi-supervised machine learning model with various ratios of unlabeled data. Semi-supervised machine learning models are built with the label spreading (LS) algorithm. The ratios of randomly unlabeled samples include 50% (LS50), 60% (LS60), 70% (LS70), 80% (LS80), and 90% (LS90). In each ratio, the semi-supervised model is cross-validated 100 times by random permutation. **(A,B)** Performance of the semi-supervised machine learning models on all unlabeled data and the valid dataset with various ratios of unknown samples, respectively. Seven metrics are given, namely, accuracy, F1 score, MCC, precision, sensitivity, specificity, and AUC.

See this image and copyright information in PMC

Cited by

Identification of gastric cancer biomarkers through in-silico analysis of microarray based datasets.
Akhtar A, Hameed Y, Ejaz S, Abdullah I. Akhtar A, et al. Biochem Biophys Rep. 2024 Nov 24;40:101880. doi: 10.1016/j.bbrep.2024.101880. eCollection 2024 Dec. Biochem Biophys Rep. 2024. PMID: 39655267 Free PMC article.
HCMV detection in Asian gastric cancer RNA-seq data sets and clinical validation in Indian GC patients reveals the HCMV-GC specific gene signatures.
Krishnamoorthy P, Raj AS, Das N, Chenkual S, Pautu JL, Ralte L, Senthil Kumar N, Kumar H. Krishnamoorthy P, et al. mSystems. 2024 Oct 22;9(10):e0067324. doi: 10.1128/msystems.00673-24. Epub 2024 Sep 16. mSystems. 2024. PMID: 39283078 Free PMC article.
Big Data and Artificial Intelligence in Drug Discovery for Gastric Cancer: Current Applications and Future Perspectives.
Nguyen MH, Tran ND, Le NQK. Nguyen MH, et al. Curr Med Chem. 2025;32(10):1968-1986. doi: 10.2174/0929867331666230913105829. Curr Med Chem. 2025. PMID: 37711014 Review.
Beyond Biomarkers: Machine Learning-Driven Multiomics for Personalized Medicine in Gastric Cancer.
Ma D, Fan C, Sano T, Kawabata K, Nishikubo H, Imanishi D, Sakuma T, Maruo K, Yamamoto Y, Matsuoka T, Yashiro M. Ma D, et al. J Pers Med. 2025 Apr 24;15(5):166. doi: 10.3390/jpm15050166. J Pers Med. 2025. PMID: 40423038 Free PMC article. Review.
DNMT1 blocks SOX21-repressed CKS2 transcription to promote gastric cancer progression.
Wei J, Xue S, Du X, Dai Y, Ji Y, He G. Wei J, et al. BMC Cancer. 2025 Jul 17;25(1):1182. doi: 10.1186/s12885-025-14577-z. BMC Cancer. 2025. PMID: 40676553 Free PMC article.

References

1. Ahluwalia P., Kolhe R., Gahlay G. K. (2021). The Clinical Relevance of Gene Expression Based Prognostic Signatures in Colorectal Cancer. Biochimica Biophysica Acta (BBA) - Rev. Cancer 1875 (2), 188513. 10.1016/j.bbcan.2021.188513 - DOI - PubMed
1. Ali H. E. A., Lung P.-Y., Sholl A. B., Gad S. A., Bustamante J. J., Ali H. I., et al. (2018). Dysregulated Gene Expression Predicts Tumor Aggressiveness in African-American Prostate Cancer Patients. Sci. Rep. 8 (1), 16335. 10.1038/s41598-018-34637-8 - DOI - PMC - PubMed
1. Altman D. G., Bland J. M. (1994). Statistics Notes: Diagnostic Tests 1: Sensitivity and Specificity. BMJ 308 (6943), 1552. 10.1136/bmj.308.6943.1552 - DOI - PMC - PubMed
1. Asplund J., Kauppila J. H., Mattsson F., Lagergren J. (2018). Survival Trends in Gastric Adenocarcinoma: A Population-Based Study in Sweden. Ann. Surg. Oncol. 25 (9), 2693–2702. 10.1245/s10434-018-6627-y - DOI - PMC - PubMed
1. Assenov Y., Ramírez F., Schelhorn S.-E., Lengauer T., Albrecht M. (2007). Computing Topological Parameters of Biological Networks. Bioinformatics 24 (2), 282–284. 10.1093/bioinformatics/btm554 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ESRRG, ATP4A, and ATP4B as Diagnostic Biomarkers for Gastric Cancer: A Bioinformatic Analysis Based on Machine Learning

Affiliations

ESRRG, ATP4A, and ATP4B as Diagnostic Biomarkers for Gastric Cancer: A Bioinformatic Analysis Based on Machine Learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources