Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 4:13:1067524.
doi: 10.3389/fgene.2022.1067524. eCollection 2022.

Identification of the diagnostic genes and immune cell infiltration characteristics of gastric cancer using bioinformatics analysis and machine learning

Affiliations

Identification of the diagnostic genes and immune cell infiltration characteristics of gastric cancer using bioinformatics analysis and machine learning

Rongjun Xie et al. Front Genet. .

Abstract

Background: Finding reliable diagnostic markers for gastric cancer (GC) is important. This work uses machine learning (ML) to identify GC diagnostic genes and investigate their connection with immune cell infiltration. Methods: We downloaded eight GC-related datasets from GEO, TCGA, and GTEx. GSE13911, GSE15459, GSE19826, GSE54129, and GSE79973 were used as the training set, GSE66229 as the validation set A, and TCGA & GTEx as the validation set B. First, the training set screened differentially expressed genes (DEGs), and gene ontology (GO), kyoto encyclopedia of genes and genomes (KEGG), disease Ontology (DO), and gene set enrichment analysis (GSEA) analyses were performed. Then, the candidate diagnostic genes were screened by LASSO and SVM-RFE algorithms, and receiver operating characteristic (ROC) curves evaluated the diagnostic efficacy. Then, the infiltration characteristics of immune cells in GC samples were analyzed by CIBERSORT, and correlation analysis was performed. Finally, mutation and survival analyses were performed for diagnostic genes. Results: We found 207 up-regulated genes and 349 down-regulated genes among 556 DEGs. gene ontology analysis significantly enriched 413 functional annotations, including 310 biological processes, 23 cellular components, and 80 molecular functions. Six of these biological processes are closely related to immunity. KEGG analysis significantly enriched 11 signaling pathways. 244 diseases were closely related to Ontology analysis. Multiple entries of the gene set enrichment analysis analysis were closely related to immunity. Machine learning screened eight candidate diagnostic genes and further validated them to identify ABCA8, COL4A1, FAP, LY6E, MAMDC2, and TMEM100 as diagnostic genes. Six diagnostic genes were mutated to some extent in GC. ABCA8, COL4A1, LY6E, MAMDC2, TMEM100 had prognostic value. Conclusion: We screened six diagnostic genes for gastric cancer through bioinformatic analysis and machine learning, which are intimately related to immune cell infiltration and have a definite prognostic value.

Keywords: LASSO; SVM-RFE; bioinformatics analysis; diagnostic gene; gastric cancer; immune cell infiltration; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Flow chart of research design and analysis.
FIGURE 2
FIGURE 2
DEGs between tumor and normal groups in the training set. (A) Volcano plot of DEGs with difference folds >2, red for up-regulated and green for down-regulated. (B) Heat map of DEGs, red represents high expression, and blue represents low expression.
FIGURE 3
FIGURE 3
Enrichment analysis of DEGs. GO (A), KEGG (B), and DO (C) analysis of the enrichment of DEGs for function, pathways, and disease, and GSEA analysis of differences in function (D,E) and pathways (F,G) between the tumor and normal groups.
FIGURE 4
FIGURE 4
LASSO and SVM-RFE screening of candidate diagnostic genes.(A) LASSO screening of candidate diagnostic genes, with logλ on the horizontal axis and cross-validation error on the vertical axis. The cross-validation error is minimal when 43 genes are selected. (B) Different colored lines represent different genes screened by LASSO. (C) SVM-RFE screening of candidate diagnostic genes. The horizontal axis represents the change in the number of genes, and the vertical axis represents the cross-validation error. The cross-validation error was minimized when n = 34. (D) The Venn diagram displays the intersection of the results of the two algorithms.
FIGURE 5
FIGURE 5
Expression of candidate diagnostic genes in the training set (A–H) Scatter plots showing the expression of candidate diagnostic genes between tumor and normal groups in the training set. Red indicates the tumor group and blue indicates the normal group. p < 0.05 indicates significant difference.
FIGURE 6
FIGURE 6
ROC curves in the training set. (A–H) The ROC curves for the eight candidate diagnostic genes in the training set are shown in the figure. The horizontal coordinate is the false positive rate, presented as 1-specificity, and the vertical coordinate is the true positive rate, presented as sensitivity.
FIGURE 7
FIGURE 7
Analysis of immune cell infiltration. (A) The graph shows the degree of infiltration of different immune cells between the tumor and normal groups. (B) Immune cell correlation analysis. The horizontal and vertical axes are the names of immune cells, and the values indicate the correlation coefficients between immune cells. The red color indicates a positive correlation, and the blue indicates a negative one. (C) Violin plot showing the difference of immune infiltrating cells between tumor and normal groups. The horizontal axis indicates the name of immune cells, and the vertical axis indicates the content of immune cells. Blue indicates the normal group, and red indicates the tumor group. p < 0.05 indicates a significant difference.
FIGURE 8
FIGURE 8
Correlation analysis of diagnostic genes and immune infiltrating cells. (A–F) Correlation between diagnostic genes and immune infiltrating cells. Horizontal coordinates indicate correlation coefficients, and vertical coordinates indicate immune cell names. The circle size means the absolute value of the correlation coefficient, the color indicates the p-value of the correlation test, and the p-value size is indicated by color.
FIGURE 9
FIGURE 9
Mutation analysis of diagnostic genes (A) Mutations of diagnostic genes between tumor and normal groups. (B) Mutations of diagnostic genes in the tumor group. (C) Mutations of diagnostic genes in the normal group.
FIGURE 10
FIGURE 10
Survival analysis of candidate diagnostic genes (A–R) Effect of each diagnostic gene on overall survival (OS), first progression (FP), and post-progression survival (PPS).

Similar articles

Cited by

References

    1. Bray F., Ferlay J., Soerjomataram I., Siegel R. L., Torre L. A., Jemal A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca. Cancer J. Clin. 68 (6), 394–424. 10.3322/caac.21492 - DOI - PubMed
    1. Cao W., Chen H. D., Yu Y. W., Li N., Chen W. Q. (2021). Changing profiles of cancer burden worldwide and in China: A secondary analysis of the global cancer statistics 2020. Chin. Med. J. 134 (7), 783–791. 10.1097/CM9.0000000000001474 - DOI - PMC - PubMed
    1. Chen D. S., Mellman I. (2017). Elements of cancer immunity and the cancer-immune set point. Nature 541 (7637), 321–330. 10.1038/nature21349 - DOI - PubMed
    1. Chin C. N., Sachs J. N., Engelman D. M. (2005). Transmembrane homodimerization of receptor-like protein tyrosine phosphatases. FEBS Lett. 579 (17), 3855–3858. 10.1016/j.febslet.2005.05.071 - DOI - PubMed
    1. Coleman R. L., Oza A. M., Lorusso D., Aghajanian C., Oaknin A., Dean A., et al. (2017). Rucaparib maintenance treatment for recurrent ovarian carcinoma after response to platinum therapy (ARIEL3): A randomised, double-blind, placebo-controlled, phase 3 trial. Lancet 390 (10106), 1949–1961. 10.1016/S0140-6736(17)32440-6 - DOI - PMC - PubMed