Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 10;15(5):e1007052.
doi: 10.1371/journal.pcbi.1007052. eCollection 2019 May.

Pathway-specific protein domains are predictive for human diseases

Affiliations

Pathway-specific protein domains are predictive for human diseases

Jung Eun Shim et al. PLoS Comput Biol. .

Abstract

Protein domains are basic functional units of proteins. Many protein domains are pervasive among diverse biological processes, yet some are associated with specific pathways. Human complex diseases are generally viewed as pathway-level disorders. Therefore, we hypothesized that pathway-specific domains could be highly informative for human diseases. To test the hypothesis, we developed a network-based scoring scheme to quantify specificity of domain-pathway associations. We first generated domain profiles for human proteins, then constructed a co-pathway protein network based on the associations between domain profiles. Based on the score, we classified human protein domains into pathway-specific domains (PSDs) and non-specific domains (NSDs). We found that PSDs contained more pathogenic variants than NSDs. PSDs were also enriched for disease-associated mutations that disrupt protein-protein interactions (PPIs) and tend to have a moderate number of domain interactions. These results suggest that mutations in PSDs are likely to disrupt within-pathway PPIs, resulting in functional failure of pathways. Finally, we demonstrated the prediction capacity of PSDs for disease-associated genes with experimental validations in zebrafish. Taken together, the network-based quantitative method of modeling domain-pathway associations presented herein suggested underlying mechanisms of how protein domains associated with specific pathways influence mutational impacts on diseases via perturbations in within-pathway PPIs, and provided a novel genomic feature for interpreting genetic variants to facilitate the discovery of human disease genes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of scoring pathway specificity of the protein domains.
(A) A co-pathway protein network was constructed based on similarity of the protein domain profiles (0 and 1 represent absence and presence of each domain, respectively, in the protein). Sub-networks that represent pathway f1, f2, and f3 were enriched for domain d1, d2, and d3, respectively. Probability operating the same pathway is proportional to the edge thickness. (B) Next, each protein received a protein-pathway association (PPA) score for a specific pathway f by sum of edge scores to all member proteins of the pathway f. (C) Domain-pathway association (DPA) score of each domain was assigned by the average PPA of all proteins that harbor the domain. In this example, DPA of domain d3 for pathway f3, DPA3(f3), was assigned by the average of PPA8(f3), PPA9(f3), and PPA10(f3). Gini Index (GI) was used to measure the impurity of the data. (D) Subsequently, pathway specificity (PS) was calculated. In this example, because domain d1, d2, and d3 have high PSs for pathway f1, f2, and f3, respectively, they were classified as pathway-specific domains (PSDs) for the corresponding pathways. However, domain d4 was classified as a non-specific domain (NSD) due to the low PS for all pathways.
Fig 2
Fig 2. Disease implications of PSDs.
(A) Regression between pathway specificity (PS) and the significance of overlap with the gold-standard domain-pathway pairs by sigmoidal curve fitting. Domain-pathway associations were divided into two groups: the top 16,000 associations that showed significant overlap (p < 0.01 by Fisher’s exact test) with the gold-standard data, and the remaining 33,636 associations. 4,506 domains for the top 16,000 associations were defined as pathway-specific domains (PSDs) and 3,856 domains for the remaining associations were defined as non-specific domains (NSDs). (B) Comparison of normalized variation rates (NVRs) for neutral and pathogenic variants between PSDs and NSDs (*, P < 0.01; n.s., P > 0.05) (C) Comparison of NVRs for three classes of missense disease mutations described by Sahni et al. and nonsynonymous variants known to affect physical protein interactions by IMEx consortium between PSDs and NSDs (*, P < 0.01; n.s., P > 0.05). (D) Comparison of the ratios (log base 2) of PSDs to NSDs for groups of human structural interaction network (hSIN) interfacing domains with similar sizes for different ranges of domain interaction connectivity. (E) Proposed models for the relationships between mutational consequences and the number of domain interactions. The blue node represents a hub domain that mediates interactions between a large number of proteins that contain domains with a single or a few, at most, interacting domains (green nodes), and the yellow nodes represent domains with moderate numbers of domain interactions, which are involved in ‘within-pathways’ (shaded areas).
Fig 3
Fig 3. PSDs can predict disease genes.
(A) A summary of candidate gene selection for coronary artery disease (CAD) and schizophrenia (SCZ) by integration of GWAS significance and PSD occurrence data. SNPs from GWASs were divided into three groups: (i) SNPs with high significance that indicate confident candidate genes; (ii) SNPs with low significance that are generally discarded; and (iii) SNPs with moderate significance that were considered for further selection in this study. Based on the overlap between disease genes and pathway genes, we converted domain-pathway associations into domain-disease associations to identify disease-associated PSDs. Candidate disease genes of the GWAS∩PSD set were selected based on the occurrence of disease-associated PSDs of the genes with moderate GWAS significance. (B) The precision of CAD gene predictions was assessed based on CADgeneDB annotations. The precision by random expectation (i.e., the number of disease genes / the number of all human genes) is indicated by the blue line (~2.5%). (C) The precision of SCZ predictions was assessed based on SZdatabase annotations. The precision by random expectation is indicated by the blue line (~4.1%).
Fig 4
Fig 4. Experimental validation of novel genes for heart development in zebrafish.
(A) Tg(flk1:EGFP) zebrafish embryos injected with morpholinos (MOs) for novel candidate genes for CAD showed morphological heart abnormalities, such as peripheral edema at 3 days post-fertilization (arrows in the left panel, scale bar = 500 μm). Zebrafish embryos normally have hearts with a left ventricle (V) and right atrium (A), whereas the embryos injected with MOs related to CAD genes exhibited either no asymmetry or reversed V and A orientation (middle panels, scale bar = 200 μm). These embryos also exhibited malformed blood vessels in the trunk (asterisks in the right panel, scale bar = 200 μm). (B) MO-injected Tg(flk1:EGFP) zebrafish embryos were counted to quantify those that exhibited heart asymmetry. (C) MO-injected Tg(flk1:EGFP) zebrafish embryos were counted to quantify those that exhibited vascular defects. Over 20 MO-injected embryos per gene were counted for each analysis (A-C).

Similar articles

Cited by

References

    1. Moore AD, Bjorklund AK, Ekman D, Bornberg-Bauer E, Elofsson A. Arrangements in the modular evolution of proteins. Trends Biochem Sci. 2008;33(9):444–51. 10.1016/j.tibs.2008.05.008 . - DOI - PubMed
    1. Chothia C, Gough J, Vogel C, Teichmann SA. Evolution of the protein repertoire. Science. 2003;300(5626):1701–3. 10.1126/science.1085371 . - DOI - PubMed
    1. Fang H, Gough J. DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more. Nucleic acids research. 2013;41(Database issue):D536–44. 10.1093/nar/gks1080 - DOI - PMC - PubMed
    1. Hegyi H, Gerstein M. Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. Genome research. 2001;11(10):1632–40. 10.1101/gr.183801 - DOI - PMC - PubMed
    1. Clark WT, Radivojac P. Analysis of protein function and its prediction from amino acid sequence. Proteins. 2011;79(7):2086–96. 10.1002/prot.23029 . - DOI - PubMed

Publication types

MeSH terms