Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 26;15(21):11782-11810.
doi: 10.18632/aging.205053. Epub 2023 Sep 26.

Hub gene identification and molecular subtype construction for Helicobacter pylori in gastric cancer via machine learning methods and NMF algorithm

Affiliations

Hub gene identification and molecular subtype construction for Helicobacter pylori in gastric cancer via machine learning methods and NMF algorithm

Lianghua Luo et al. Aging (Albany NY). .

Abstract

Helicobacter pylori (HP) is a gram-negative and spiral-shaped bacterium colonizing the human stomach and has been recognized as the risk factor of gastritis, peptic ulcer disease, and gastric cancer (GC). Moreover, it was recently identified as a class I carcinogen, which affects the occurrence and progression of GC via inducing various oncogenic pathways. Therefore, identifying the HP-related key genes is crucial for understanding the oncogenic mechanisms and improving the outcomes of GC patients. We retrieved the list of HP-related gene sets from the Molecular Signatures Database. Based on the HP-related genes, unsupervised non-negative matrix factorization (NMF) clustering method was conducted to stratify TCGA-STAD, GSE15459, GSE84433 samples into two clusters with distinct clinical outcomes and immune infiltration characterization. Subsequently, two machine learning (ML) strategies, including support vector machine-recursive feature elimination (SVM-RFE) and random forest (RF), were employed to determine twelve hub HP-related genes. Beyond that, receiver operating characteristic and Kaplan-Meier curves further confirmed the diagnostic value and prognostic significance of hub genes. Finally, expression of HP-related hub genes was tested by qRT-PCR array and immunohistochemical images. Additionally, functional pathway enrichment analysis indicated that these hub genes were implicated in the genesis and progression of GC by activating or inhibiting the classical cancer-associated pathways, such as epithelial-mesenchymal transition, cell cycle, apoptosis, RAS/MAPK, etc. In the present study, we constructed a novel HP-related tumor classification in different datasets, and screened out twelve hub genes via performing the ML algorithms, which may contribute to the molecular diagnosis and personalized therapy of GC.

Keywords: Helicobacter pylori; cluster; gastric cancer; hub genes; therapy.

PubMed Disclaimer

Conflict of interest statement

CONFLICTS OF INTEREST: The authors declare that they have no conflicts of interest.

Figures

Figure 1
Figure 1
Flowchart illustrating the workflow of this study.
Figure 2
Figure 2
Construction of a NMF subtype based on the differentially expressed HP-related genes in the TCGA-STAD cohort. (A) NMF consensus clustering for k = 2. (B) Kaplan–Meier analysis of overall survival (OS) for Cluster C1 and C2. (C) Principal component analysis (PCA). (D, E) Differential analyses of immune and stromal score between Cluster C1 and C2. (F) Violin plot showing the immune cell infiltration landscape across different clusters. (G, H) Box plot of estimated IC50 values for Imatinib and Sunitinib in Cluster C1 and C2. (I) Box plot visualizing the significant expression differences of immune checkpoints across distinct clusters, including BTLA and PD-L2. *:P<0.05 ** :P<0.01 ***:P<0.001.
Figure 3
Figure 3
Selection of the HP-related hub genes via machine learning strategies. (A, B) Boxplot and reverse cumulative distribution curve of residual. (C) Comparison of ROC curves for evaluating the diagnostic reliability of support vector machine-recursive feature elimination (SVM-RFE) and random forest (RF) models. (D) Error graph of RF model. (E) Based on RF algorithm to screen the HP-related hub genes. (F) On the basis of SVM-RFE method to identify the HP-related hub genes.
Figure 4
Figure 4
Construction of the diagnostic nomogram on the basis of the twelve hub genes. (A) Venn diagram taking the intersection of the results of two ML strategies. (B) ROC curves measuring the diagnostic efficacy of the twelve HP-related hub genes. (C) Decision curve of nomogram graph. (D) Nomogram for the diagnosis of gastric cancer (GC). (E) Calibration curve demonstrating the diagnostic performance of the nomogram. (F) Clinical impact curve.
Figure 5
Figure 5
Kaplan-Meier (K-M) survival curves of the hub genes. (A) EFNA3. (B) FLT1. (C) L3MBTL3. (D) MAPK10. (E) MLEC. (F) MYB. (G) NRP1. (H) UHRF1. Univariate Cox regression analysis of the twelve hub genes. (I) Forest plot showing the prognostic values of hub genes.
Figure 6
Figure 6
The immune-infiltrating landscape of GC based on the twelve hub genes. (AL) Lollipop plots revealing the association between the twelve hub genes and the infiltration level of various immune cells.
Figure 7
Figure 7
Mutational characteristics of the hub genes. (A) Copy number variation (CNV) frequency of hub genes. (B) Circle diagram of CNV with hub genes. (C) Correlation between expression of hub genes and CNV. (D) Cascade of hub gene mutations. (E) Details regarding single nucleotide variants (SNV).
Figure 8
Figure 8
Prediction of drug sensitivity. (A) Correlation between hub gene expression levels and GSDC drug sensitivity via the online search tool GSCA. (B) Structural formulas of the sensitive agents (including AZD8055, CI-1040, PLX4720, TPCA-1, Vorinostat, CEP-701, THZ-2-102-1, UNC0638, IPA-3, KIN001-260, SB590885, and KIN001-270).
Figure 9
Figure 9
Functional and pathway enrichment analysis of the hub genes. (A) Construction of a protein-protein interaction (PPI) network through using the GeneMANIA database. (B) The hub genes being involved in several key cancer-associated processes, such as epithelial-mesenchymal transition (EMT), receptor tyrosine kinase (RTK), cell cycle, apoptosis, etc. (C) The result of predicted miRNAs targeting hub genes using the GSCALite website.
Figure 10
Figure 10
Validating the mRNA expression of the twelve hub genes in normal gastric epithelial and GC cell lines via the quantitative reverse transcription polymerase chain reaction (qRT-PCR) assays.

Similar articles

Cited by

References

    1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021; 71:209–49. 10.3322/caac.21660 - DOI - PubMed
    1. Islami F, DeSantis CE, Jemal A. Incidence Trends of Esophageal and Gastric Cancer Subtypes by Race, Ethnicity, and Age in the United States, 1997-2014. Clin Gastroenterol Hepatol. 2019; 17:429–39. 10.1016/j.cgh.2018.05.044 - DOI - PubMed
    1. Allemani C, Weir HK, Carreira H, Harewood R, Spika D, Wang XS, Bannon F, Ahn JV, Johnson CJ, Bonaventure A, Marcos-Gragera R, Stiller C, Azevedo e Silva G, et al., and CONCORD Working Group. Global surveillance of cancer survival 1995-2009: analysis of individual data for 25,676,887 patients from 279 population-based registries in 67 countries (CONCORD-2). Lancet. 2015; 385:977–1010. 10.1016/S0140-6736(14)62038-9 - DOI - PMC - PubMed
    1. Lina TT, Alzahrani S, Gonzalez J, Pinchuk IV, Beswick EJ, Reyes VE. Immune evasion strategies used by Helicobacter pylori. World J Gastroenterol. 2014; 20:12753–66. 10.3748/wjg.v20.i36.12753 - DOI - PMC - PubMed
    1. Marshall BJ, Warren JR. Unidentified curved bacilli in the stomach of patients with gastritis and peptic ulceration. Lancet. 1984; 1:1311–5. 10.1016/s0140-6736(84)91816-6 - DOI - PubMed

Publication types