Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;14(7):2621-2634.
doi: 10.21037/jtd-22-632.

Identification of hub genes and their correlation with immune infiltration in coronary artery disease through bioinformatics and machine learning methods

Affiliations

Identification of hub genes and their correlation with immune infiltration in coronary artery disease through bioinformatics and machine learning methods

Ke-Ke Huang et al. J Thorac Dis. 2022 Jul.

Abstract

Background: Coronary artery disease (CAD) is a multifactorial disease and its pathogenesis remains unclear. We aimed to explore the optimal feature genes (OFGs) for CAD and to investigate the function of immune cell infiltration of CAD. It will be helpful for better understanding of the pathogenesis and the development of genetic prediction of CAD.

Methods: Datasets related to CAD were obtained from the Gene Expression Omnibus (GEO) database. Cases from the datasets met diagnostic criteria including clinical symptoms, electrocardiographic (ECG) and angiographic evidence. We identified differentially expressed genes (DEGs) and conducted functional enrichment analysis. OFGs were obtained from the least absolute shrinkage and selection operator (LASSO) algorithm, support vector machine recursive feature elimination (SVM-RFE) algorithm, and random forest (RF) algorithm. CIBERSORT was used to compare immune infiltration between CAD patients and normal controls, and the correlation between OFGs and immune cells was analyzed.

Results: DEGs were involved in the interleukin (IL)-17 signaling pathway, nuclear factor (NF)-kappa B signaling pathway, and tumor necrosis factor (TNF) signaling pathway. Gene Ontology (GO) analysis revealed DEGs were enriched in lipopolysaccharide (LPS), tertiary granule, and pattern recognition receptor activity. Disease Ontology (DO) analysis suggested DEGs were enriched in lung disease, arteriosclerotic cardiovascular disease (CVD). Matrix metalloproteinase 9 (MMP9), Pellino E3 ubiquitin protein ligase 1 (PELI1), thrombomodulin (THBD), and zinc finger protein 36 (ZFP36) were screened by the intersection of OFGs obtained from LASSO, SVM-REF, and RF algorithms. CAD patients had a lower proportion of memory B cells (P=0.019), CD8 T cells (P<0.001), resting memory CD4 T cells (P<0.001), regulatory T cells (P=0.028), and gamma delta T cells (P<0.001) than normal controls, while the proportion of activated memory CD4 T cells (P=0.014), resting natural killer (NK) cells (P<0.001), monocytes (P<0.001), M0 macrophages (P=0.023), activated mast cells (P<0.001), and neutrophils (P<0.001) in CAD patients were higher than normal controls. MMP9, PELI1, THBD, and ZFP36 were correlated with immune cells.

Conclusions: MMP9, PELI1, THBD, and ZFP36 may be predicted biomarkers for CAD. The OFGs and association between OFGs and immune infiltration may provide potential biomarkers for CAD prediction along with the better assessment of the disease.

Keywords: Coronary artery disease (CAD); bioinformatics analysis; immune infiltration; machine learning (ML); optimal feature genes (OFGs).

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jtd.amegroups.com/article/view/10.21037/jtd-22-632/coif). The authors have no conflicts of interest to declare.

Figures

Figure 1
Figure 1
Schematic overview of study. GO, Gene Ontology; DO, Disease Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; LASSO, least absolute shrinkage and selection operator; SVM-RFE, support vector machine recursive feature elimination; RF, random forest; MMP9, matrix metalloproteinase 9; PELI1, ellino E3 ubiquitin protein ligase 1; THBD, thrombomodulin; ZFP36, zinc finger protein 36; ROC, receiver operating characteristic.
Figure 2
Figure 2
Differential expression analysis. (A) Cluster heatmap for DEGs in CAD patients and normal controls. From blue to red represents the low expression to high expression. (B) Volcano plot for DEGs; red dots represent upregulated differential genes, and the green dots represent downregulated differential genes (|log2FC|>1 and adjusted P<0.05). ; Con, control; FC, fold change; DEG, differentially expressed gene; CAD, coronary artery disease.
Figure 3
Figure 3
Functional enrichment analysis of DEGs. (A) Gene Ontology enrichment analysis; the figure represents biological process, cellular component, and molecular function (top 30 according to adjusted P value, respectively). (B) Circos graph for Gene Ontology enrichment analysis. (C) Disease Ontology enrichment analysis (top 30 according to adjusted P value). (D) Kyoto Encyclopedia of Genes and Genomes enrichment analysis (top 30 according to adjusted P value). GO, Gene Ontology; FC, fold change; DEG, differentially expressed gene.
Figure 4
Figure 4
Three machine learning algorithms were used for OFGs. (A) LASSO algorithm to screen OFGs. (B) SVM-REF algorithm; RMSE was the statistical parameter to determine the optimal feature genes after the analysis of recursive feature elimination algorithm. The lowest RMSE corresponds with the optimal feature genes. (C) RF algorithm to select OFGs; MeanDecreaseGini score >2 was used as the threshold to determine whether a gene was selected. (D) The individual feature selection by LASSO, SVM-RFE, and Random Forest algorithms and the intersection of OFGs obtained from the 3 algorithms. SVM-REF, support vector machine recursive feature elimination; LASSO, least absolute shrinkage and selection operator; OFGs, optimal feature genes; RMSE, root mean square error; RF, random forest.
Figure 5
Figure 5
ROC curves of the predictive efficacy of MMP9, PELI1, THBD, and ZFP36. MMP9, matrix metallopeptidase 9; PELI1, Pellino E3 ubiquitin protein ligase 1; THBD, thrombomodulin; ZFP36, zinc finger protein 36; AUC, area under the curve; CI, confidence interval; ROC, receiver operating characteristic.
Figure 6
Figure 6
Validation of the OFGs and ROC curves. (A) Expression of PELI1 and ZFP36 in CAD patients compared to normal controls in the validation dataset (only genes with P<0.05 are shown). (B) ROC curves of the predictive efficacy of PELI1 and ZFP36 in the validation set. Con, control; PELI1, Pellino E3 ubiquitin protein ligase 1; AUC, area under the curve; CI, confidence interval; ZFP36, zinc finger protein 36; OFGs, optimal feature genes; ROC, receiver operating characteristic; CAD, coronary artery disease.
Figure 7
Figure 7
Immune cell infiltration analysis. (A) The relative percentage of 22 immune cell subpopulations of the samples from the merged dataset. (B) Correlation heatmap of 22 immune cells: red and blue represent positive and negative correlation, respectively. The square area with a deeper color has a stronger correlation index. (C) Violin diagram displays different fractions of 22 immune cells in CAD and control samples. Con, control; CAD, coronary artery disease.
Figure 8
Figure 8
Visualization of Spearman correlation between immune cells and the 4 optimal feature genes. The dot with a larger size has a stronger correlation coefficient. The P value is presented by different colors, the dot with a greener color has a smaller P value, while the yellower color has a larger P value. Abs (cor), absolute value (correlation); MMP9, matrix metallopeptidase 9; PELI1, Pellino E3 ubiquitin protein ligase 1; THBD, thrombomodulin; ZFP36, zinc finger protein 36.

Similar articles

Cited by

References

    1. Kuulasmaa K, Tunstall-Pedoe H, Dobson A, et al. Estimation of contribution of changes in classic risk factors to trends in coronary-event rates across the WHO MONICA Project populations. Lancet 2000;355:675-87. 10.1016/S0140-6736(99)11180-2 - DOI - PubMed
    1. Mallika V, Goswami B, Rajappa M. Atherosclerosis pathophysiology and the role of novel risk factors: a clinicobiochemical perspective. Angiology 2007;58:513-22. 10.1177/0003319707303443 - DOI - PubMed
    1. Malakar AK, Choudhury D, Halder B, et al. A review on coronary artery disease, its risk factors, and therapeutics. J Cell Physiol 2019;234:16812-23. 10.1002/jcp.28350 - DOI - PubMed
    1. Saunders JT, Nambi V, de Lemos JA, et al. Cardiac troponin T measured by a highly sensitive assay predicts coronary heart disease, heart failure, and mortality in the Atherosclerosis Risk in Communities Study. Circulation 2011;123:1367-76. 10.1161/CIRCULATIONAHA.110.005264 - DOI - PMC - PubMed
    1. Gaubatz JW, Heideman C, Gotto AM, Jr, et al. Human plasma lipoprotein [a]. Structural properties. J Biol Chem 1983;258:4582-9. 10.1016/S0021-9258(18)32663-2 - DOI - PubMed