Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 25:12:779186.
doi: 10.3389/fgene.2021.779186. eCollection 2021.

Graph Embedding Based Novel Gene Discovery Associated With Diabetes Mellitus

Affiliations

Graph Embedding Based Novel Gene Discovery Associated With Diabetes Mellitus

Jianzong Du et al. Front Genet. .

Abstract

Diabetes mellitus is a group of complex metabolic disorders which has affected hundreds of millions of patients world-widely. The underlying pathogenesis of various types of diabetes is still unclear, which hinders the way of developing more efficient therapies. Although many genes have been found associated with diabetes mellitus, more novel genes are still needed to be discovered towards a complete picture of the underlying mechanism. With the development of complex molecular networks, network-based disease-gene prediction methods have been widely proposed. However, most existing methods are based on the hypothesis of guilt-by-association and often handcraft node features based on local topological structures. Advances in graph embedding techniques have enabled automatically global feature extraction from molecular networks. Inspired by the successful applications of cutting-edge graph embedding methods on complex diseases, we proposed a computational framework to investigate novel genes associated with diabetes mellitus. There are three main steps in the framework: network feature extraction based on graph embedding methods; feature denoising and regeneration using stacked autoencoder; and disease-gene prediction based on machine learning classifiers. We compared the performance by using different graph embedding methods and machine learning classifiers and designed the best workflow for predicting genes associated with diabetes mellitus. Functional enrichment analysis based on Human Phenotype Ontology (HPO), KEGG, and GO biological process and publication search further evaluated the predicted novel genes.

Keywords: diabetes mellitus; disease gene prediction; graph embedding; molecular network; novel gene discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Workflow of our method. Abbreviations: SVM: supporting vector machine, RF: random forest, LR: logistic regression.
FIGURE 2
FIGURE 2
Prediction performance in five-fold cross validation based on three graph embedding methods. Three different graph embedding methods are compared: DeepWalk, LINE, and Node2vec. Four metrics are used for performance evaluation: AUROC, AUPRC, F1 score, and accuracy (ACC).
FIGURE 3
FIGURE 3
The effects of feature dimension on prediction performance. Four feature dimensions (i.e., 64, 128, 256, and 512) generated by graph embedding methods are used for comparison. Three different graph embedding methods are also compared.
FIGURE 4
FIGURE 4
Effect on prediction performance by hyper-parameters in Node2vec and different machine learning classifiers. (A) Prediction performance under various p and q values in Node2vec. (B) Prediction performance of SVM, Logistic regression and Random Forest in five-fold cross validation.
FIGURE 5
FIGURE 5
Largest component of PPI subnetwork among these top-predicted genes and known genes associated with diabetes mellitus. Nodes in pink represent top predicted genes. Nodes in blue represent know diabetes genes.
FIGURE 6
FIGURE 6
Functional enrichment results based on HPO, KEGG, and GO. p-values are shown in log scale and only top 10 terms are shown in each category.

Similar articles

Cited by

References

    1. Agrawal M., Zitnik M., Leskovec J. (2018). Large-scale Analysis of Disease Pathways in the Human Interactome. PSB 23, 111–122. 10.1142/9789813235533_0011: - DOI - PMC - PubMed
    1. Al Dubayee M., Alshahrani A., Aljada D., Zahra M., Alotaibi A., Ababtain I., et al. (2021). Gene Expression Profiling of Apoptotic Proteins in Circulating Peripheral Blood Mononuclear Cells in Type II Diabetes Mellitus and Modulation by Metformin. Dmso 14, 1129–1139. 10.2147/dmso.s300048 - DOI - PMC - PubMed
    1. Ampuero J., Ranchal I., del Mar Díaz-Herrero M., del Campo J. A., Bautista J. D., Romero-Gómez M. (2013). Role of Diabetes Mellitus on Hepatic Encephalopathy. Metab. Brain Dis. 28, 277–279. 10.1007/s11011-012-9354-2 - DOI - PubMed
    1. Berbudi A., Rahmadika N., Tjahjadi A. I., Ruslami R. (2020). Type 2 Diabetes and its Impact on the Immune System. Cdr 16, 442–449. 10 Data Availability Statement Publicly available datasets were analyzed in this study. 10.2174/1573399815666191024085838 - DOI - PMC - PubMed
    1. Chang C.-C., Lin C.-J. (2011). Libsvm. ACM Trans. Intell. Syst. Technol. 2, 1–27. 10.1145/1961189.1961199 - DOI

LinkOut - more resources