Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 17:16:1573621.
doi: 10.3389/fgene.2025.1573621. eCollection 2025.

Identification of biomarkers between coronary artery disease and non-alcoholic steatohepatitis: a combination of bioinformatics and machine learning

Affiliations

Identification of biomarkers between coronary artery disease and non-alcoholic steatohepatitis: a combination of bioinformatics and machine learning

Yihong Lin et al. Front Genet. .

Abstract

Background: Non-alcoholic steatohepatitis (NASH) commonly complicates coronary artery disease (CAD), yet the interaction mechanism remains unclear. Our research seeks to investigate the common mechanisms and key signature genes between CAD and NASH.

Methods: RNA sequence information for CAD and NASH was screened from the GEO database. Weighted gene co-expression network analysis (WGCNA) and differentially expressed gene analysis identified key genes, followed by functional enrichment analysis of these shared genes. Three machine learning methods-LASSO, random forest, and SVM-RFE-were used to identify signature genes. Gene set enrichment analysis (GSEA) was then performed to explore potential mechanisms associated with the signature genes. In addition, single-sample gene set enrichment analysis (ssGSEA) evaluated immune infiltration in CAD and NASH and its correlation with the signature genes.

Results: WGCNA has revealed two key modules for CAD and NASH. The intersection of the CAD modules and their differential genes narrowed the key genes down to 2,808 shared genes. Finally, 44 shared genes were selected for both CAD and NASH. Kyoto Encyclopedia of Genes and Genomes analysis showed that these genes were primarily enriched in insulin resistance and inflammation pathways. Machine learning identified the signature genes BATF3, SOCS2, and GPER, all with ROC values above 0.7, validated in external datasets. GSEA revealed that these genes act through common mechanisms in CAD and NASH, regulating metabolic, inflammatory, and cardiovascular pathways. In addition, ssGSEA suggested their involvement in immune cell infiltration.

Conclusion: BATF3, SOCS2, and GPER have emerged as promising gene candidates that may serve as biomarkers or potential therapeutic targets for CAD combined with NASH, linked to the regulation of metabolic, inflammatory, and cardiovascular pathways. We also identified insulin resistance and inflammation pathways as common mechanisms underlying both diseases.

Keywords: WGCNA; bioinformatics; coronary artery disease; machine learning; non-alcoholic steatohepatitis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The process diagram of this research.
FIGURE 2
FIGURE 2
Detection of module genes using WGCNA in GSE113079 for CAD and GSE89632 for NASH. (A) Clustering dendrogram showing gene co-expression modules in various colors for CAD. (B) Clustering dendrogram for gene co-expression modules in NASH. (C) Heatmap of module-trait relationships in CAD, with row-column intersections indicating correlation and p-values. The x-axis represents different samples, with blue indicating healthy controls and red indicating disease group samples. The y-axis represents differentially expressed genes. Color intensity reflects the expression level of each gene across the samples. (D) Heatmap of module-trait relationships in NASH, with intersections showing correlation and p-values. Axes and box colors as described above.
FIGURE 3
FIGURE 3
Identification of DEGs in coronary artery disease. (A) The volcano plot displays DEG expression between CAD and healthy groups. (B) Heatmap showing the top 50 upregulated and downregulated DEGs. Blue squares indicate the healthy group, and red squares indicate the disease group.
FIGURE 4
FIGURE 4
Venn diagrams for differential analysis and common gene screening. (A) The Venn plot illustrated the overlap between DEGs and genes identified in WGCNA. (B) Venn diagram of the shared genes in CAD and NASH. The gene we selected is circled in red.
FIGURE 5
FIGURE 5
Functional enrichment analysis of shared genes. (A) KEGG analysis of shared genes. (B) The top 10 functional enrichments in each of the three GO categories. In both panels, the y-axis represents KEGG or GO enrichment pathways, and the x-axis represents the number of enriched genes. The color gradient indicates the p-value of enrichment.
FIGURE 6
FIGURE 6
Machine learning model construction. (A) LASSO penalty plot with error bars for standard errors. (B) LASSO L1 norm path plot. (C) Top 15 important genes. (D) Random Forest error rate vs number of trees. (E) SVM-RFE accuracy rate curve. (F) SVM-RFE error rate curve. (G) Venn diagram of genes from the three algorithms.
FIGURE 7
FIGURE 7
Signature genes performance in GSE113079 and GSE89632. (A–C) Expression levels in CAD vs healthy cohorts. The x-axis represents the sample groups, and the y-axis shows the expression level of the gene log2normalizedcounts. Blue boxes represent the healthy group, and red boxes represent the disease group. (D–F) ROC curves showing diagnostic performance in CAD. (G–I) Expression levels in NASH vs healthy cohorts. Axes and box colors as described above. (J–L) ROC curves illustrating diagnostic performance in NASH.
FIGURE 8
FIGURE 8
(A) Expression levels of GPER in NASH vs. healthy cohorts in females. (B) Expression levels of GPER in NASH vs. healthy cohorts in males.
FIGURE 9
FIGURE 9
Evaluation of signature genes in GSE66360 and GSE135251. (A–C) Expression levels of signature genes in CAD vs healthy cohorts. (D–F) ROC curves demonstrating the predictive power of signature genes in CAD. (G–I) Expression levels of signature genes in NASH vs healthy cohorts. (J–L) ROC curves showcasing the diagnostic efficacy of the signature genes in NASH.
FIGURE 10
FIGURE 10
GSEA of the signature genes in CAD and NASH. (A) GSEA of BATF3 in CAD. (B) GSEA of SOCS2 in CAD. (C) GSEA of GPER in CAD. (D) GSEA of BATF3 in NASH. (E) GSEA of SOCS2 in NASH. (F) GSEA of GPER in NASH. In each panel, the x-axis represents all genes ranked by log2 fold change, and the y-axis shows the running enrichment score.
FIGURE 11
FIGURE 11
Immune cell infiltration. (A) Immune cell infiltration comparison between CAD and healthy cohorts. (B) Comparison of immune cell infiltration between NASH and healthy cohorts. “ns” indicates p0.05 , * denotes 0.01p<0.05 , ** indicates 0.001p<0.01 , and indicates p<0.001 . The x-axis represents different immune cells, and the y-axis indicates the ssGSEA score. Each box represents the score distribution of a group: blue boxes represent the healthy group, and red boxes represent the disease group.
FIGURE 12
FIGURE 12
Association of immune cell infiltration with signature genes, validated by CIBERSORT. (A) Correlation of signature genes with differences in immune cell infiltration in CAD. (B) Correlation of signature genes with differences in immune cell infiltration in NASH. The x-axis represents different immune cell types, and the y-axis represents different genes. The color of each square indicates the corresponding p-value, with the color gradient reflecting the level of statistical significance. (C) Gamma delta T cell expression in CAD versus healthy controls using CIBERSORT. (D) Gamma delta T cell expression in NASH versus healthy controls using CIBERSORT.

Similar articles

References

    1. Barton M., Prossnitz E. R. (2015). Emerging roles of GPER in diabetes and atherosclerosis. Trends Endocrinol. and Metabolism 26 (4), 185–192. 10.1016/j.tem.2015.02.003 - DOI - PMC - PubMed
    1. Brouwers M. C., Simons N., Stehouwer C. D., Koek G. H., Schaper N. C., Isaacs A. (2019). Relationship between nonalcoholic fatty liver disease susceptibility genes and coronary artery disease. Hepatol. Commun. 3 (4), 587–596. 10.1002/hep4.1319 - DOI - PMC - PubMed
    1. Cabrera-Galván J. J., Araujo E., de Mirecki-Garrido M., Pérez-Rodríguez D., Guerra B., Aranda-Tavío H., et al. (2023). SOCS2 protects against chemical-induced hepatocellular carcinoma progression by modulating inflammation and cell proliferation in the liver. Biomed. and Pharmacother. 157, 114060. 10.1016/j.biopha.2022.114060 - DOI - PubMed
    1. Chen X.-m., Zhang T., Qiu D., Feng J.-y., Jin Z.-y., Luo Q., et al. (2018). Gene expression pattern of TCR repertoire and alteration expression of IL-17A gene of γδ T cells in patients with acute myocardial infarction. J. Transl. Med. 16 (1), 189–10. 10.1186/s12967-018-1567-7 - DOI - PMC - PubMed
    1. Chiu M. H., Heydari B., Batulan Z., Maarouf N., Subramanya V., Schenck-Gustafsson K., et al. (2018). Coronary artery disease in post-menopausal women: are there appropriate means of assessment? Clin. Sci. 132 (17), 1937–1952. 10.1042/CS20180067 - DOI - PubMed

LinkOut - more resources