. 2016 Jan 11;10 Suppl 1(Suppl 1):4.

doi: 10.1186/s12918-015-0247-y.

Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships

Wangshu Zhang¹, Marcelo P Coba^{2

3}, Fengzhu Sun^{4

5}

Affiliations

¹ Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, USA. wangshuz@usc.edu.
² Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. coba@usc.edu.
³ Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. coba@usc.edu.
⁴ Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, USA. fsun@usc.edu.
⁵ Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China. fsun@usc.edu.

PMID: 26818594
PMCID: PMC4895779
DOI: 10.1186/s12918-015-0247-y

Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships

Wangshu Zhang et al. BMC Syst Biol. 2016.

. 2016 Jan 11;10 Suppl 1(Suppl 1):4.

doi: 10.1186/s12918-015-0247-y.

Authors

Wangshu Zhang¹, Marcelo P Coba^{2

3}, Fengzhu Sun^{4

5}

Affiliations

¹ Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, USA. wangshuz@usc.edu.
² Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. coba@usc.edu.
³ Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA. coba@usc.edu.
⁴ Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, USA. fsun@usc.edu.
⁵ Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China. fsun@usc.edu.

PMID: 26818594
PMCID: PMC4895779
DOI: 10.1186/s12918-015-0247-y

Abstract

Background: Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understanding of the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases.

Methods: Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations.

Results: We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn's disease and type 2 diabetes.

Conclusions: The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms.

PubMed Disclaimer

Figures

**Fig. 1**
The relationships between the different data types. The histograms of the number of a proteins with respect to the number of domains the protein contains, b disease modules with respect to the number of diseases the module contains, c disease modules with respect to the number of proteins the module associates, d domains with respect to the number of proteins the domain associates, e diseases with respect to the number of disease modules the disease associates, and f proteins with respect to the number of disease modules the protein associates

**Fig. 2**
Scheme for predicting domain-disease relationships. Nodes represent diseases/traits, modules, proteins and domains. An edge connecting two nodes represents a known association. Steps 1-7 demonstrate the procedure that, when predicting for a specific disease, how to obtain its candidate domains. Step 1: For a given disease T _n, all module(s) containing this disease (in the figure M _j) and all the other diseases/traits contained in M _j are extracted. Step 2: Module(s) sharing at least one disease with module M _j (in the figure $M_{j^{'}}$ ) are extracted. Step 3: All the other diseases/traits in $M_{j^{'}}$ are included in the prediction scheme. Step 4: All proteins associated with the set of $\{M_{j}, M_{j^{'}}\}$ (in the figure P _i1 and P _i2) are extracted. Step 5: All domains contained in the set of {P _i1, P _i2} are included in the prediction scheme. Step 6: All proteins sharing domains with proteins {P _i1, P _i2} are included in the prediction scheme. Step 7: All the other domains in all proteins produced at Step 6 are included in the prediction scheme and the resulting set of domains are called candidate domains

**Fig. 3**
Receiver Operating Characteristic (ROC) and Precision-Recall curves of the different approaches. The figure shows ROC curves (Subplot a) and precision-recall curves (Subplot b) of the Association, MLE (fp = 0, fn = 0.9), DPEA, Bayesian (u _p, u _n = 0, v _p, v _n = 1, and α = 2, β = 2), and PE (r = 100 %, and pw threshold ≤ 0.01) approaches, respectively. Based on both ROC and precision-recall curves, the three MLE based approaches including DPEA, MLE and Bayesian outperform PE and Association. The Bayesian approach performs slightly better than DPEA and MLE

**Fig. 4**
Influences of the free parameters on the performance of the MLE and PE approaches. Horizontally, Subplots a-c illustrate the influences of false positive rate (fp) and false negative rate (fn) on AUC, accuracy and the mean rank ratio of the of the MLE approach; Subplots d-f illustrate the influences of reliable rate (r) and pw threshold on AUC, accuracy and the mean rank ratio of the PE approach. Vertically, Subplots a and d illustrate AUC scores; Subplots B and E illustrate accuracies; Subplots C and F illustrate mean rank ratios, respectively

**Fig. 5**
Example for illustration of module effect. Nodes represent diseases with OMIM numbers, modules with index numbers, proteins with OMIM numbers and domains with Pfam numbers. Edges connecting two nodes represent a known association. Nodes with the same background colors represent 7 known associations between corresponding diseases and proteins. (i) Disease OMIM # corresponds to disease/trait names as: [179800]: RENAL TUBULAR ACIDOSIS, DISTAL, AUTOSOMAL DOMINANT. [179830]: RENAL TUBULAR ACIDOSIS, PROXIMAL. [267200]: RENAL TUBULAR ACIDOSIS III. [267300]: RENAL TUBULAR ACIDOSIS, DISTAL, WITH PROGRESSIVE NERVE DEAFNESS. [602722]: RENAL TUBULAR ACIDOSIS, DISTAL, AUTOSOMAL RECESSIVE; RTADR. [604278]: RENAL TUBULAR ACIDOSIS, PROXIMAL, WITH OCULAR ABNORMALITIES AND MENTAL RETARDATION. [259730]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 3; OPTB3. [259700]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 1; OPTB1. [259710]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 2; OPTB2. [259720]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 5; OPTB5. [600329]: OSTEOPETROSIS AND INFANTILE NEUROAXONAL DYSTROPHY. [611490]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 4; OPTB4. [611497]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 6; OPTB6. [612301]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 7; OPTB7. (ii) Protein OMIM # corresponds to gene name as: [164360]: ATP5A1; [114815]: CA8; [109270]: SLC4A1; [192132]: ATP6V1B1; [603345]: SLC4A4; [611492]: CA2; [602642]: TNFSF11; [153440]: LTA; [191160]: TNF; [300386]: CD40LG; [146690]: IMPDH1; [602727]: CLCN7; [602743]: PRKAG2; [604592]: TCIRG1; [611716]: ATP6V0A2. (iii) Domain Pfam # corresponds to domain name as: [PF00006]: ATP-synt_ab; [PF02874]: ATP-synt_ab_N; [PF00194]: Carb_anhydrase; [PF00955]: HCO3_cotransp; [PF07565]: Band_3_cyto; [PF00306]: ATP-synt_ab_C; [PF00229]: TNF; [PF00478]: IMPDH; [PF00571]: CBS; [PF00654]: Voltage_CLC; [PF01496]: V_ATPase_I

See this image and copyright information in PMC

Cited by

Mapping OMIM Disease-Related Variations on Protein Domains Reveals an Association Among Variation Type, Pfam Models, and Disease Classes.
Savojardo C, Babbi G, Martelli PL, Casadio R. Savojardo C, et al. Front Mol Biosci. 2021 May 7;8:617016. doi: 10.3389/fmolb.2021.617016. eCollection 2021. Front Mol Biosci. 2021. PMID: 34026820 Free PMC article.
Protein structural domain-disease association prediction based on heterogeneous networks.
Zhang J, Deng L, Deng L. Zhang J, et al. BMC Genomics. 2025 Apr 10;23(Suppl 6):869. doi: 10.1186/s12864-024-11117-0. BMC Genomics. 2025. PMID: 40211147 Free PMC article.
Pathogenic variation types in human genes relate to diseases through Pfam and InterPro mapping.
Babbi G, Savojardo C, Baldazzi D, Martelli PL, Casadio R. Babbi G, et al. Front Mol Biosci. 2022 Sep 16;9:966927. doi: 10.3389/fmolb.2022.966927. eCollection 2022. Front Mol Biosci. 2022. PMID: 36188216 Free PMC article.
DapBCH: a disease association prediction model Based on Cross-species and Heterogeneous graph embedding.
Shi W, Feng H, Li J, Liu T, Liu Z. Shi W, et al. Front Genet. 2023 Sep 22;14:1222346. doi: 10.3389/fgene.2023.1222346. eCollection 2023. Front Genet. 2023. PMID: 37811150 Free PMC article.
CCDC66 frameshift variant associated with a new form of early-onset progressive retinal atrophy in Portuguese Water Dogs.
Murgiano L, Becker D, Spector C, Carlin K, Santana E, Niggel JK, Jagannathan V, Leeb T, Pearce-Kelling S, Aguirre GD, Miyadera K. Murgiano L, et al. Sci Rep. 2020 Dec 3;10(1):21162. doi: 10.1038/s41598-020-77980-5. Sci Rep. 2020. PMID: 33273526 Free PMC article.

See all "Cited by" articles

References

1. Rehm HL. Disease-targeted sequencing: a cornerstone in the clinic. Nat Rev Genet. 2013;14(4):295–300. doi: 10.1038/nrg3463. - DOI - PMC - PubMed
1. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90(1):7–24. doi: 10.1016/j.ajhg.2011.11.029. - DOI - PMC - PubMed
1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42(Database issue):D1001–D1006. doi: 10.1093/nar/gkt1229. - DOI - PMC - PubMed
1. McCarthy MI, Hirschhorn JN. Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet. 2008;17(R2):R156–R165. doi: 10.1093/hmg/ddn289. - DOI - PMC - PubMed
1. Arrowsmith CH, Bountra C, Fish PV, Lee K, Schapira M. Epigenetic protein families: a new frontier for drug discovery. Nat Rev Drug Discov. 2012;11(5):384–400. doi: 10.1038/nrd3674. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships

Affiliations

Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical