Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 28;13(12):2233.
doi: 10.3390/genes13122233.

Bioinformatics Prediction and Machine Learning on Gene Expression Data Identifies Novel Gene Candidates in Gastric Cancer

Affiliations

Bioinformatics Prediction and Machine Learning on Gene Expression Data Identifies Novel Gene Candidates in Gastric Cancer

Medi Kori et al. Genes (Basel). .

Abstract

Gastric cancer (GC) is one of the five most common cancers in the world and unfortunately has a high mortality rate. To date, the pathogenesis and disease genes of GC are unclear, so the need for new diagnostic and prognostic strategies for GC is undeniable. Despite particular findings in this regard, a holistic approach encompassing molecular data from different biological levels for GC has been lacking. To translate Big Data into system-level biomarkers, in this study, we integrated three different GC gene expression data with three different biological networks for the first time and captured biologically significant (i.e., reporter) transcripts, hub proteins, transcription factors, and receptor molecules of GC. We analyzed the revealed biomolecules with independent RNA-seq data for their diagnostic and prognostic capabilities. While this holistic approach uncovered biomolecules already associated with GC, it also revealed novel system biomarker candidates for GC. Classification performances of novel candidate biomarkers with machine learning approaches were investigated. With this study, AES, CEBPZ, GRK6, HPGDS, SKIL, and SP3 were identified for the first time as diagnostic and/or prognostic biomarker candidates for GC. Consequently, we have provided valuable data for further experimental and clinical efforts that may be useful for the diagnosis and/or prognosis of GC.

Keywords: diagnostic genes; disease genes; gastric cancer; multi-omics; prognostic genes; systems biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The computational flow employed in the study.
Figure 2
Figure 2
Meta-analysis of the three transcriptome datasets associated with gastric cancer. (A) Pie donut diagram shows the distribution of differentially expressed genes (DEGs) of the three transcriptome datasets. (B) The Venn diagram shows the DEGs common to the datasets. (C) The gene set overrepresentation analysis of the common DEGs.
Figure 3
Figure 3
The reconstructed human biological networks. (A) The reconstructed protein–protein interaction (PPI) network. The revealed significant hub proteins according to employed topological parameters were shown in orange. (B) The reconstructed transcriptional regulatory interaction network. The statistically significant (p-value < 0.001) reporter transcription factors (TFs) were shown in blue. (C) The reconstructed protein–receptor interaction network interaction network. The statistically significant (p-value < 0.001) reporter receptors were shown in green.
Figure 4
Figure 4
The diagnostic performance analyses of the reporter biomolecules. (A) The bubble plot representing the AUC values of the reporter biomolecules. Only the AUC values that were considered significant in the study were shown (AUC > 70%). Hub proteins are shown in orange, reporter transcription factors (TFs) in blue, and reporter receptors in green. (B) The major reporter biomolecules: a hub protein, a TF, and a reporter receptor according to their AUC values.
Figure 5
Figure 5
Analysis of prognostic performance of reporter biomolecules. Box plots showing expression levels of reporter biomolecules between low and high-risk groups with p-values. Kaplan–Meier plots estimating survival of patients with gastric cancer showing p-value and hazard ratio for each curve. (A) Hub protein: PDGFRB. (B) Hub protein: TP53. (C) Hub protein: TRIM29. (D) Reporter transcription factor (TF): AR. (E) Reporter TF: HOXA11. (F) Reporter TF: NELFB. (G) Reporter TF: SKIL. (H) Reporter receptor: GRK6. The high-risk group was shown in red, while the low-risk group was shown in blue.
Figure 6
Figure 6
Machine learning analysis for novel diagnostic and/or prognostic biomarker candidates. (A) The accuracy, F1, and recall score plot of eight different classification algorithms for discriminating between diseased samples and controls. (B) The accuracy, F1, and recall score plot of eight different classification algorithms for discriminating between alive and dead samples.

Similar articles

Cited by

References

    1. Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F.J.C. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Lyons K., Le L.C., Pham Y.T.H., Borron C., Park J.Y., Tran C.T., Tran T.V., Tran H.T.-T., Vu K.T., Do C.D., et al. Gastric cancer: Epidemiology, biology, and prevention: A mini review. Eur. J. Cancer Prev. 2019;28:397–412. doi: 10.1097/CEJ.0000000000000480. - DOI - PubMed
    1. Ho S.W.T., Tan P. Dissection of gastric cancer heterogeneity for precision oncology. Cancer Sci. 2019;110:3405–3414. doi: 10.1111/cas.14191. - DOI - PMC - PubMed
    1. Biagioni A., Skalamera I., Peri S., Schiavone N., Cianchi F., Giommoni E., Magnelli L., Papucci L. Update on gastric cancer treatments and gene therapies. Cancer Metastasis Rev. 2019;38:537–548. doi: 10.1007/s10555-019-09803-7. - DOI - PubMed
    1. Correa R., Alonso-Pupo N., Rodríguez E.W.H. Multi-omics data integration approaches for precision oncology. Mol. Omics. 2022;18:469–479. doi: 10.1039/D1MO00411E. - DOI - PubMed

Substances