Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 27;12(11):1713.
doi: 10.3390/genes12111713.

MOSES: A New Approach to Integrate Interactome Topology and Functional Features for Disease Gene Prediction

Affiliations

MOSES: A New Approach to Integrate Interactome Topology and Functional Features for Disease Gene Prediction

Manuela Petti et al. Genes (Basel). .

Abstract

Disease gene prediction is to date one of the main computational challenges of precision medicine. It is still uncertain if disease genes have unique functional properties that distinguish them from other non-disease genes or, from a network perspective, if they are located randomly in the interactome or show specific patterns in the network topology. In this study, we propose a new method for disease gene prediction based on the use of biological knowledge-bases (gene-disease associations, genes functional annotations, etc.) and interactome network topology. The proposed algorithm called MOSES is based on the definition of two somewhat opposing sets of genes both disease-specific from different perspectives: warm seeds (i.e., disease genes obtained from databases) and cold seeds (genes far from the disease genes on the interactome and not involved in their biological functions). The application of MOSES to a set of 40 diseases showed that the suggested putative disease genes are significantly enriched in their reference disease. Reassuringly, known and predicted disease genes together, tend to form a connected network module on the human interactome, mitigating the scattered distribution of disease genes which is probably due to both the paucity of disease-gene associations and the incompleteness of the interactome.

Keywords: computational biology; data integration; disease gene prediction; precision medicine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Example of network-based distance between warm seeds (red diamonds) and cold seeds (blue diamonds). The color of background ovals codes for the path length (PL) between WSs and CSs.
Figure 2
Figure 2
Flowchart of MOSES algorithm. The background rectangles identify the three sequential phases: red, blue and orange respectively for: (1) WSs functional characterization, (2) CSs identification and characterization, and (3) optimized clustering phase. The enrichment analysis can be performed considering different types of gene annotations (Gene Ontology database, KEGG pathways, miRTarBase) and obtaining for each of them Mi annotations, with i = GO; KEGG; miRTarBase, etc.
Figure 3
Figure 3
Clustering process applied to the disease amino acid metabolism inborn errors using GO-BP annotations (top panel) and KEGG pathways (bottom panel). On the right, the procedure of data integration and the identification of putative disease genes are shown.
Figure 4
Figure 4
10-fold cross-validation. Difference between MOSES and RWR performances. The performances are computed as the percentage of recovered warm seeds in the test set SP. Rows and columns represent respectively the diseases and the cross-validation iterations. In the case of positive values (orange pixels), MOSES outperforms RWR, while negative values (green pixels) refer to the opposite situation.
Figure 5
Figure 5
Largest connected component (LCC) of the predicted disease module (warm seeds and putative disease genes) for ulcerative colitis. Node shape codes for the type of genes: red diamonds represent the warm seeds (22 nodes), while orange dots represent the putative disease genes (118 nodes). On the right, distribution of the size of the 1000 LCCs of the random disease modules (|LCCWS+RG|) obtained adding to the warm seeds, a set of randomly selected genes with cardinality equal to the set of putative genes; the orange arrow indicates the size of the LCCWS+PG shown in the left panel.

References

    1. What Is Precision Medicine? MedlinePlus Genetics. [(accessed on 21 October 2021)]; Available online: https://medlineplus.gov/genetics/understanding/precisionmedicine/definit...
    1. Hamosh A., Scott A.F., Amberger J.S., Bocchini C.A., Valle D., McKusick V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2002;30:52–55. doi: 10.1093/nar/30.1.52. - DOI - PMC - PubMed
    1. Ramos E.M., Hoffman D., Junkins H.A., Maglott D., Phan L., Sherry S.T., Feolo M., Hindorff L.A. Phenotype-Genotype Integrator (PheGenI): Synthesizing genome-wide association study (GWAS) data with existing genomic resources. Eur. J. Hum. Genet. 2014;22:144–147. doi: 10.1038/ejhg.2013.96. - DOI - PMC - PubMed
    1. Piñero J., Bravo À., Queralt-Rosinach N., Gutiérrez-Sacristán A., Deu-Pons J., Centeno E., García-García J., Sanz F., Furlong L.I. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017;45:D833–D839. doi: 10.1093/nar/gkw943. - DOI - PMC - PubMed
    1. Babbi G., Martelli P.L., Profiti G., Bovo S., Savojardo C., Casadio R. eDGAR: A database of Disease-Gene Associations with annotated Relationships among genes. BMC Genom. 2017;18:554. doi: 10.1186/s12864-017-3911-3. - DOI - PMC - PubMed

Publication types

LinkOut - more resources