Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 19;12(1):19955.
doi: 10.1038/s41598-022-24421-0.

GediNET for discovering gene associations across diseases using knowledge based machine learning approach

Affiliations

GediNET for discovering gene associations across diseases using knowledge based machine learning approach

Emma Qumsiyeh et al. Sci Rep. .

Abstract

The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of prior knowledge-based approaches has shown significant promise in the discovery of new biomarkers with potential translational applications. In this study, we developed a novel approach, GediNET, that integrates prior biological knowledge to gene Groups that are shown to be associated with a specific disease such as a cancer. The novelty of GediNET is that it then also allows the discovery of significant associations between that specific disease and other diseases. The initial step in this process involves the identification of gene Groups. The Groups are then subjected to a Scoring component to identify the top performing classification Groups. The top-ranked gene Groups are then used to train a Machine Learning Model. The process of Grouping, Scoring and Modelling (G-S-M) is used by GediNET to identify other diseases that are similarly associated with this signature. GediNET identifies these relationships through Disease-Disease Association (DDA) based machine learning. DDA explores novel associations between diseases and identifies relationships which could be used to further improve approaches to diagnosis, prognosis, and treatment. The GediNET KNIME workflow can be downloaded from: https://github.com/malikyousef/GediNET.git or https://kni.me/w/3kH1SQV_mMUsMTS .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
A part of the DisGeNET dataset histogram frequency plot. It shows the number of genes associated with each disease, where the X-axis is the disease name, and Y-axis is the number of genes.
Figure 2
Figure 2
Decision Tree model. The left panel illustrates the traditional approach that detects gene-disease associations, while the right panel illustrates the disease-disease association as the output of GediNET.
Figure 3
Figure 3
GediNET workflow. The main workflow of G-S-M that integrates pre-existing biological knowledge for grouping genes based on disease-gene association, which is derived from the DisGeNET v7 database.
Figure 4
Figure 4
An example of creating two-class subdatasets extracted according to disease-group names. These subdatasets will be subject to the S component for scoring.
Figure 5
Figure 5
The details of the S component. The G panel contains all the two-class sub-datasets that each one is subject to the S component.
Figure 6
Figure 6
GediNET workflow in KNIME.
Figure 7
Figure 7
The mean AUC values of GediNET, CogNet, maTE and PriPath for ten different datasets for the top two groups.
Figure 8
Figure 8
The mean number of genes of GediNET, CogNet, maTE and PriPath tools for ten different datasets for the top two groups.
Figure 9
Figure 9
Network visualization of the gene interaction for the cell signaling pathway with overlapping genes for the ten GEO datasets using the cytoscape tool.
Figure 10
Figure 10
Network visualization of the cell signaling pathway with overlapping genes for the GDS3257 dataset using the cytoscape tool.
Figure 11
Figure 11
An example of the DDA for four datasets in GediNET. The number of shared genes for the top-scored disease group is represented. The upper panel shows the DDA for GDS1962, GDS3257, GDS2771 and GDS5499 datasets. The lower panel shows the annotations used in the DDA illustration formation.

Similar articles

Cited by

References

    1. Wang X, Gulbahce N, Yu H. Network-based methods for human disease gene prediction. Brief. Funct. Genom. 2011;10:280–293. doi: 10.1093/bfgp/elr024. - DOI - PubMed
    1. Chen B, Shang X, Li M, Wang J, Wu F-X. Identifying individual-cancer-related genes by rebalancing the training samples. IEEE Trans. NanoBiosci. 2016;15:1–1. doi: 10.1109/TNB.2016.2553119. - DOI - PubMed
    1. Browne F, Wang H, Zheng H. A computational framework for the prioritization of disease-gene candidates. BMC Genom. 2015 doi: 10.1186/1471-2164-16-S9-S2. - DOI - PMC - PubMed
    1. Navlakha S, Kingsford C. The power of protein interaction networks for associating genes with diseases. Bioinformatics. 2010;26:1057–1063. doi: 10.1093/bioinformatics/btq076. - DOI - PMC - PubMed
    1. Advances in translational bioinformatics: Computational approaches for the hunting of disease genes | Briefings in bioinformatics | Oxford academic. https://academic.oup.com/bib/article/11/1/96/193936 (Accessed 30 November 2021). - PMC - PubMed

Publication types