Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 11;10 Suppl 1(Suppl 1):4.
doi: 10.1186/s12918-015-0247-y.

Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships

Affiliations

Inference of domain-disease associations from domain-protein, protein-disease and disease-disease relationships

Wangshu Zhang et al. BMC Syst Biol. .

Abstract

Background: Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understanding of the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases.

Methods: Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations.

Results: We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn's disease and type 2 diabetes.

Conclusions: The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The relationships between the different data types. The histograms of the number of a proteins with respect to the number of domains the protein contains, b disease modules with respect to the number of diseases the module contains, c disease modules with respect to the number of proteins the module associates, d domains with respect to the number of proteins the domain associates, e diseases with respect to the number of disease modules the disease associates, and f proteins with respect to the number of disease modules the protein associates
Fig. 2
Fig. 2
Scheme for predicting domain-disease relationships. Nodes represent diseases/traits, modules, proteins and domains. An edge connecting two nodes represents a known association. Steps 1-7 demonstrate the procedure that, when predicting for a specific disease, how to obtain its candidate domains. Step 1: For a given disease T n, all module(s) containing this disease (in the figure M j) and all the other diseases/traits contained in M j are extracted. Step 2: Module(s) sharing at least one disease with module M j (in the figure Mj) are extracted. Step 3: All the other diseases/traits in Mj are included in the prediction scheme. Step 4: All proteins associated with the set of MjMj (in the figure P i1 and P i2) are extracted. Step 5: All domains contained in the set of {P i1, P i2} are included in the prediction scheme. Step 6: All proteins sharing domains with proteins {P i1, P i2} are included in the prediction scheme. Step 7: All the other domains in all proteins produced at Step 6 are included in the prediction scheme and the resulting set of domains are called candidate domains
Fig. 3
Fig. 3
Receiver Operating Characteristic (ROC) and Precision-Recall curves of the different approaches. The figure shows ROC curves (Subplot a) and precision-recall curves (Subplot b) of the Association, MLE (fp = 0, fn = 0.9), DPEA, Bayesian (u p, u n = 0, v p, v n = 1, and α = 2, β = 2), and PE (r = 100 %, and pw threshold ≤ 0.01) approaches, respectively. Based on both ROC and precision-recall curves, the three MLE based approaches including DPEA, MLE and Bayesian outperform PE and Association. The Bayesian approach performs slightly better than DPEA and MLE
Fig. 4
Fig. 4
Influences of the free parameters on the performance of the MLE and PE approaches. Horizontally, Subplots a-c illustrate the influences of false positive rate (fp) and false negative rate (fn) on AUC, accuracy and the mean rank ratio of the of the MLE approach; Subplots d-f illustrate the influences of reliable rate (r) and pw threshold on AUC, accuracy and the mean rank ratio of the PE approach. Vertically, Subplots a and d illustrate AUC scores; Subplots B and E illustrate accuracies; Subplots C and F illustrate mean rank ratios, respectively
Fig. 5
Fig. 5
Example for illustration of module effect. Nodes represent diseases with OMIM numbers, modules with index numbers, proteins with OMIM numbers and domains with Pfam numbers. Edges connecting two nodes represent a known association. Nodes with the same background colors represent 7 known associations between corresponding diseases and proteins. (i) Disease OMIM # corresponds to disease/trait names as: [179800]: RENAL TUBULAR ACIDOSIS, DISTAL, AUTOSOMAL DOMINANT. [179830]: RENAL TUBULAR ACIDOSIS, PROXIMAL. [267200]: RENAL TUBULAR ACIDOSIS III. [267300]: RENAL TUBULAR ACIDOSIS, DISTAL, WITH PROGRESSIVE NERVE DEAFNESS. [602722]: RENAL TUBULAR ACIDOSIS, DISTAL, AUTOSOMAL RECESSIVE; RTADR. [604278]: RENAL TUBULAR ACIDOSIS, PROXIMAL, WITH OCULAR ABNORMALITIES AND MENTAL RETARDATION. [259730]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 3; OPTB3. [259700]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 1; OPTB1. [259710]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 2; OPTB2. [259720]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 5; OPTB5. [600329]: OSTEOPETROSIS AND INFANTILE NEUROAXONAL DYSTROPHY. [611490]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 4; OPTB4. [611497]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 6; OPTB6. [612301]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 7; OPTB7. (ii) Protein OMIM # corresponds to gene name as: [164360]: ATP5A1; [114815]: CA8; [109270]: SLC4A1; [192132]: ATP6V1B1; [603345]: SLC4A4; [611492]: CA2; [602642]: TNFSF11; [153440]: LTA; [191160]: TNF; [300386]: CD40LG; [146690]: IMPDH1; [602727]: CLCN7; [602743]: PRKAG2; [604592]: TCIRG1; [611716]: ATP6V0A2. (iii) Domain Pfam # corresponds to domain name as: [PF00006]: ATP-synt_ab; [PF02874]: ATP-synt_ab_N; [PF00194]: Carb_anhydrase; [PF00955]: HCO3_cotransp; [PF07565]: Band_3_cyto; [PF00306]: ATP-synt_ab_C; [PF00229]: TNF; [PF00478]: IMPDH; [PF00571]: CBS; [PF00654]: Voltage_CLC; [PF01496]: V_ATPase_I

Similar articles

Cited by

References

    1. Rehm HL. Disease-targeted sequencing: a cornerstone in the clinic. Nat Rev Genet. 2013;14(4):295–300. doi: 10.1038/nrg3463. - DOI - PMC - PubMed
    1. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90(1):7–24. doi: 10.1016/j.ajhg.2011.11.029. - DOI - PMC - PubMed
    1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42(Database issue):D1001–D1006. doi: 10.1093/nar/gkt1229. - DOI - PMC - PubMed
    1. McCarthy MI, Hirschhorn JN. Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet. 2008;17(R2):R156–R165. doi: 10.1093/hmg/ddn289. - DOI - PMC - PubMed
    1. Arrowsmith CH, Bountra C, Fish PV, Lee K, Schapira M. Epigenetic protein families: a new frontier for drug discovery. Nat Rev Drug Discov. 2012;11(5):384–400. doi: 10.1038/nrd3674. - DOI - PubMed

Publication types