Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec;9(23):1711.
doi: 10.21037/atm-21-4015.

Identification of key genes associated with esophageal adenocarcinoma based on bioinformatics analysis

Affiliations

Identification of key genes associated with esophageal adenocarcinoma based on bioinformatics analysis

Weifeng Qi et al. Ann Transl Med. 2021 Dec.

Abstract

Background: Esophageal adenocarcinoma (EAC) is an aggressive malignancy and accounts for the majority of cancer-related death worldwide. It is often diagnosed at an advanced stage and entails a poor prognosis for those afflicted. The mechanisms of its pathogenesis and progress remain unclear and require urgent elucidation. This study aimed to identify specific genes and potential pathways associated with the progression and prognosis of EAC using bioinformatics analyses.

Methods: EAC microarray datasets from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases were analyzed to identify differentially expressed genes (DEGs) using bioinformatics analysis. The DEGs in TCGA were then analyzed to construct a co-expression network by weighted correlation network analysis (WGCNA), and module-clinical trait relationships were analyzed to explore the genes that associated with clinicopathological parameters of EAC. Gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyses were performed for the cancer-related genes, and a DEG-based protein-protein interaction (PPI) network was used to extract hub genes through Cytoscape plugins. The consensus survival analysis for EAC (OSeac) was performed to identify the prognosis-related genes. The immune infiltration was evaluated by tumor immune estimation resource (TIMER) algorithms, and a risk score prognostic model was established using univariate, multivariate Cox proportional hazards regression, and lasso regression analysis.

Results: Ultimately, 190 cancer-related DEGs were identified, 6 of which were found to play vital roles in the progression of EAC, including ACTA2, BGN, CALD1, COL1A1, COL4A1, and DCN. The risk score prognostic model consisted of 6 other genes that had an important impact on the prognosis of EAC, including CLDN3, EPB41L4A, ESM1, MT1X, PAQR5, and PLAU. The area under the curve of the prognostic model for predicting the survival of patients at 1, 2, and 3 years was 0.707, 0.702, and 0.726, respectively.

Conclusions: This study identified several genes with the potential to become useful targets for the diagnosis and treatment of EAC. The 6-gene-related risk score prognostic model and nomogram based on these genes may be a reliable tool for predicting the prognosis of patients with EAC.

Keywords: Esophageal adenocarcinoma (EAC); bioinformatics analysis; protein-protein interaction (PPI); risk score prognosis model; weighted gene co-expression network analysis (WGCNA).

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://dx.doi.org/10.21037/atm-21-4015). The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Flow chart of this study. TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; DEGs, differentially expressed genes; WGCNA, weighted gene co-expression network analysis; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein-protein interaction.
Figure 2
Figure 2
Volcano plots of differentially expressed genes from three datasets. The x-axis represents the fold change of gene expression, and the y-axis represents the adjusted P value. The red dots in the plot represent statistically significant up-regulated genes, while the blue dots represent significant down-regulated genes. TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; FDR, false discovery rate; FC, fold change.
Figure 3
Figure 3
Weighted gene co-expression network analysis of selected genes. (A) Clustering sample dendrogram and a trait heatmap. (B) Analysis of network topology for various soft-thresholding powers and the soft-threshold β was set to 8. (C) Hierarchical clustering dendrograms of identified co-expressed genes in modules in EAC. Each colored row represents a color-coded module which contains a group of highly connected genes. A total of 12 modules was identified by merging modules with a higher correlation. EAC, esophageal adenocarcinoma.
Figure 4
Figure 4
Heatmaps of the correlation between module eigengene and clinical traits of EAC. Each row corresponds to a module eigengene, and each column corresponds to a clinical characteristic. Each cell contains the corresponding correlation. EAC, esophageal adenocarcinoma.
Figure 5
Figure 5
Venn diagrams of DEGs in the 2 GEO datasets and the genes in 5 cancer-related modules from TCGA database. DEGs, differentially expressed genes; TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus.
Figure 6
Figure 6
Gene ontology (GO) and pathway enrichment analysis. (A) Biological process analysis. (B) Cellular component analysis. (C) Molecular function analysis. (D) KEGG pathway analysis. GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; BP, biological processes; CC, cellular components; MF, molecular function.
Figure 7
Figure 7
The protein-protein interaction (PPI) network analysis and the most significant module. (A) The PPI network of the selected genes. The genes with yellow color belong to the fairly significant modules. (B) The most significant module of the PPI network.
Figure 8
Figure 8
Kaplan-Meier curves of 6 hub genes in EAC patients. (A) ACTA2; (B) BGN; (C) CALD1; (D) COL1A1; (E) COL4A1; (F) DCN. HR, hazard ratio; CI, confidence interval; OSeac, the consensus survival analysis for EAC; EAC, esophageal adenocarcinoma; ACTA2, actin alpha 2; BGN, biglycan; CALD1, caldesmon 1; COL1A1, collagen type I alpha 1 chain; COL4A1, collagen type IV alpha 1 chain; DCN, decorin.
Figure 9
Figure 9
Correlation between 6 hub genes and immune cell infiltration (TIMER). The correlation between the abundance of immune cell and the expression of ACTA2 (A), BGN (B), CALD1 (C), COL1A1 (D), COL4A1 (E), and DCN (F) in EAC. EAC, esophageal adenocarcinoma; ACTA2, actin alpha 2; BGN, biglycan; CALD1, caldesmon 1; COL1A1, collagen type I alpha 1 chain; COL4A1, collagen type IV alpha 1 chain; DCN, decorin.
Figure 10
Figure 10
Forest plot and survival analysis for the prognostic risk score model based on 6 genes. (A) Forest plot for multivariate Cox regression. 95% confidence interval for the HR value over the box plot with associated P values were presented. (B,C) Survival curve for patients with different risk scores in the training data and test data, respectively. P<0.01. (D-F) ROC curves for the prognostic risk score model representing 1-, 2-, and 3-year predictions in the test data; the values of the areas under the curve are 0.707, 0.702, and 0.726, respectively. HR, hazard ratio; ROC, receiver operator characteristic; AUC, area under the curve.
Figure 11
Figure 11
Distribution of duration of survival and the nomogram for the risk score model, and the expression of 6 genes in the model. (A,B) Distribution of duration of survival in the training data and test data. The x-axis is arranged in order of patient risk score, and the y-axis represents patient survival time. (C) The expression of 6 prognostic genes, where red represents the high-risk group, and blue represents the low-risk group. All P<0.01. (D) A nomogram for the prognostic risk score model. “Points” is a scoring scale for the 6 genes, respectively, and “total points” is a scale for total score. OS, overall survival; CLDN3, claudin-3; ESM1, endothelial cell specific molecule-1; PLAU, plasminogen activator urokinase; EPB41L4A, erythrocyte membrane protein band 4.1 like 4A; PAQR5, progestin and adipoQ receptor family member 5; MT1X, metallothionein 1X.

Similar articles

Cited by

References

    1. Chen W, Zheng R, Baade PD, et al. Cancer statistics in China, 2015. CA Cancer J Clin 2016;66:115-32. 10.3322/caac.21338 - DOI - PubMed
    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020. CA Cancer J Clin 2020;70:7-30. 10.3322/caac.21590 - DOI - PubMed
    1. Smyth EC, Lagergren J, Fitzgerald RC, et al. Oesophageal cancer. Nat Rev Dis Primers 2017;3:17048. 10.1038/nrdp.2017.48 - DOI - PMC - PubMed
    1. Siewert JR, Ott K. Are squamous and adenocarcinomas of the esophagus the same disease? Semin Radiat Oncol 2007;17:38-44. 10.1016/j.semradonc.2006.09.007 - DOI - PubMed
    1. Tramontano AC, Sheehan DF, Yeh JM, et al. The Impact of a Prior Diagnosis of Barrett's Esophagus on Esophageal Adenocarcinoma Survival. Am J Gastroenterol 2017;112:1256-64. 10.1038/ajg.2017.82 - DOI - PMC - PubMed