Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 24;20(1):e0314319.
doi: 10.1371/journal.pone.0314319. eCollection 2025.

Identification of hypertension gene expression biomarkers based on the DeepGCFS algorithm

Affiliations

Identification of hypertension gene expression biomarkers based on the DeepGCFS algorithm

Zongjin Li et al. PLoS One. .

Abstract

Hypertension is a critical risk factor and cause of mortality in cardiovascular diseases, and it remains a global public health issue. Therefore, understanding its mechanisms is essential for treating and preventing hypertension. Gene expression data is an important source for obtaining hypertension biomarkers. However, this data has a small sample size and high feature dimensionality, posing challenges to biomarker identification. We propose a novel deep graph clustering feature selection (DeepGCFS) algorithm to identify hypertension gene biomarkers with more biological significance. This algorithm utilizes a graph network to represent the interaction information between genes, builds a GNN model, designs a loss function based on link prediction and self-supervised learning ideas for training, and allows each gene node to obtain a feature vector representing global information. The algorithm then uses hybrid clustering methods for gene module detection. Finally, it combines integrated feature selection methods to determine the gene biomarkers. The experiment revealed that all the ten identified hypertension biomarkers were significantly differentiated, and it was found that the classification performance of AUC can reach 97.50%, which is better than other literature methods. Six genes (PTGS2, TBXA2R, ZNF101, KCNJ2, MSRA, and CMTM5) have been reported to be associated with hypertension. By using GSE113439 as the validation dataset, the AUC value of classification performance was to be 95.45%, and seven of the genes (LYSMD3, TBXA2R, KLC3, GPR171, PTGS2, MSRA, and CMTM5) were to be significantly different. In addition, this algorithm's performance of gene feature vector clustering was better than other comparative methods. Therefore, the proposed algorithm has significant advantages in selecting potential hypertension biomarkers.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Algorithm framework.
n represents the number of genes, m represents the number of samples, d represents the dimensionality of features obtained through GNN representation learning, k represents the number of gene clusters modules, M represents gene modules, M1(a) denotes that the number of genes in module1 is a, Evaluator denotes then feature selection methods for gene evaluation, and MIFS represents Mutual Information Feature Selection.
Fig 2
Fig 2. GO analysis bubble plot for module 1.
Fig 3
Fig 3. Pathway analysis bubble plot for module 1.
Fig 4
Fig 4. Visualization for the module 1.
Fig 5
Fig 5. Heat map analysis of the biomarkers.
Fig 6
Fig 6. ROC for all gene biomarkers as features.
Fig 7
Fig 7. The ROC Curve of Comparison with published methods.
Fig 8
Fig 8. ROC for all gene biomarkers as features on the GSE113439 set.
Fig 9
Fig 9. The distribution of the gene biomarkers in positive and negative samples in the GSE113439 database.

References

    1. Krokstad S. Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: a pooled analysis of 1201 population-representative studies with 104 million participants (vol 398, pg 957, 2021). The Lancet. 2022;(10324): 399. doi: 10.1016/S0140-6736(21)01330-1 - DOI - PMC - PubMed
    1. Liu S, Wang Z, Zhu R, Wang F, Cheng Y, Liu Y. Three Differential Expression Analysis Methods for RNA Sequencing: limma, EdgeR, DESeq2. J Vis Exp. 2021;(175). doi: 10.3791/62528 - DOI - PubMed
    1. Gao Y, Qi GX, Jia ZM, Sun YX. Prediction of marker genes associated with hypertension by bioinformatics analyses. Int J Mol Med. 2017;40(1): 137–45. doi: 10.3892/ijmm.2017.3000 - DOI - PMC - PubMed
    1. Langfelder P, Horvath S. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform 9: 559. Bmc Bioinformatics. 2009;9(559): 559. - PMC - PubMed
    1. Li Z, Chyr J, Jia Z, Wang L, Hu X, Wu X, et al.. Identification of Hub Genes Associated with Hypertension and Their Interaction with miRNA Based on Weighted Gene Coexpression Network Analysis (WGCNA) Analysis. Med Sci Monit. 2020;26 e923514. doi: 10.12659/MSM.923514 - DOI - PMC - PubMed