. 2019 May 10;15(5):e1007052.

doi: 10.1371/journal.pcbi.1007052. eCollection 2019 May.

Pathway-specific protein domains are predictive for human diseases

Jung Eun Shim^{1

2}, Ji Hyun Kim³, Junha Shin¹, Ji Eun Lee^{3

4}, Insuk Lee^{1

5}

Affiliations

¹ Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea.
² Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea.
³ Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea.
⁴ Samsung Biomedical Research Institute, Samsung Medical Center, Seoul, Korea.
⁵ Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea.

PMID: 31075101
PMCID: PMC6530867
DOI: 10.1371/journal.pcbi.1007052

Pathway-specific protein domains are predictive for human diseases

Jung Eun Shim et al. PLoS Comput Biol. 2019.

. 2019 May 10;15(5):e1007052.

doi: 10.1371/journal.pcbi.1007052. eCollection 2019 May.

Authors

Jung Eun Shim^{1

2}, Ji Hyun Kim³, Junha Shin¹, Ji Eun Lee^{3

4}, Insuk Lee^{1

5}

Affiliations

¹ Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea.
² Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea.
³ Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea.
⁴ Samsung Biomedical Research Institute, Samsung Medical Center, Seoul, Korea.
⁵ Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea.

PMID: 31075101
PMCID: PMC6530867
DOI: 10.1371/journal.pcbi.1007052

Abstract

Protein domains are basic functional units of proteins. Many protein domains are pervasive among diverse biological processes, yet some are associated with specific pathways. Human complex diseases are generally viewed as pathway-level disorders. Therefore, we hypothesized that pathway-specific domains could be highly informative for human diseases. To test the hypothesis, we developed a network-based scoring scheme to quantify specificity of domain-pathway associations. We first generated domain profiles for human proteins, then constructed a co-pathway protein network based on the associations between domain profiles. Based on the score, we classified human protein domains into pathway-specific domains (PSDs) and non-specific domains (NSDs). We found that PSDs contained more pathogenic variants than NSDs. PSDs were also enriched for disease-associated mutations that disrupt protein-protein interactions (PPIs) and tend to have a moderate number of domain interactions. These results suggest that mutations in PSDs are likely to disrupt within-pathway PPIs, resulting in functional failure of pathways. Finally, we demonstrated the prediction capacity of PSDs for disease-associated genes with experimental validations in zebrafish. Taken together, the network-based quantitative method of modeling domain-pathway associations presented herein suggested underlying mechanisms of how protein domains associated with specific pathways influence mutational impacts on diseases via perturbations in within-pathway PPIs, and provided a novel genomic feature for interpreting genetic variants to facilitate the discovery of human disease genes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of scoring pathway specificity of the protein domains.**
**(A)** A co-pathway protein network was constructed based on similarity of the protein domain profiles (0 and 1 represent absence and presence of each domain, respectively, in the protein). Sub-networks that represent pathway f₁, f₂, and f₃ were enriched for domain d₁, d₂, and d₃, respectively. Probability operating the same pathway is proportional to the edge thickness. **(B)** Next, each protein received a protein-pathway association (*PPA*) score for a specific pathway f by sum of edge scores to all member proteins of the pathway f. **(C)** Domain-pathway association (*DPA*) score of each domain was assigned by the average *PPA* of all proteins that harbor the domain. In this example, *DPA* of domain d₃ for pathway f₃, *DPA*₃(f₃), was assigned by the average of *PPA*₈(f₃), *PPA*₉(f₃), and *PPA*₁₀(f₃). Gini Index (GI) was used to measure the impurity of the data. **(D)** Subsequently, pathway specificity (PS) was calculated. In this example, because domain d₁, d₂, and d₃ have high PSs for pathway f₁, f₂, and f₃, respectively, they were classified as pathway-specific domains (PSDs) for the corresponding pathways. However, domain d₄ was classified as a non-specific domain (NSD) due to the low PS for all pathways.

**Fig 2. Disease implications of PSDs.**
**(A)** Regression between pathway specificity (PS) and the significance of overlap with the gold-standard domain-pathway pairs by sigmoidal curve fitting. Domain-pathway associations were divided into two groups: the top 16,000 associations that showed significant overlap (p < 0.01 by Fisher’s exact test) with the gold-standard data, and the remaining 33,636 associations. 4,506 domains for the top 16,000 associations were defined as pathway-specific domains (PSDs) and 3,856 domains for the remaining associations were defined as non-specific domains (NSDs). **(B)** Comparison of normalized variation rates (NVRs) for neutral and pathogenic variants between PSDs and NSDs (*, P < 0.01; n.s., P > 0.05) **(C)** Comparison of NVRs for three classes of missense disease mutations described by Sahni *et al*. and nonsynonymous variants known to affect physical protein interactions by IMEx consortium between PSDs and NSDs (*, P < 0.01; n.s., P > 0.05). **(D)** Comparison of the ratios (log base 2) of PSDs to NSDs for groups of human structural interaction network (hSIN) interfacing domains with similar sizes for different ranges of domain interaction connectivity. **(E)** Proposed models for the relationships between mutational consequences and the number of domain interactions. The blue node represents a hub domain that mediates interactions between a large number of proteins that contain domains with a single or a few, at most, interacting domains (green nodes), and the yellow nodes represent domains with moderate numbers of domain interactions, which are involved in ‘within-pathways’ (shaded areas).

**Fig 3. PSDs can predict disease genes.**
**(A)** A summary of candidate gene selection for coronary artery disease (CAD) and schizophrenia (SCZ) by integration of GWAS significance and PSD occurrence data. SNPs from GWASs were divided into three groups: (i) SNPs with high significance that indicate confident candidate genes; (ii) SNPs with low significance that are generally discarded; and (iii) SNPs with moderate significance that were considered for further selection in this study. Based on the overlap between disease genes and pathway genes, we converted domain-pathway associations into domain-disease associations to identify disease-associated PSDs. Candidate disease genes of the GWAS∩PSD set were selected based on the occurrence of disease-associated PSDs of the genes with moderate GWAS significance. **(B)** The precision of CAD gene predictions was assessed based on CADgeneDB annotations. The precision by random expectation (i.e., the number of disease genes / the number of all human genes) is indicated by the blue line (~2.5%). **(C)** The precision of SCZ predictions was assessed based on SZdatabase annotations. The precision by random expectation is indicated by the blue line (~4.1%).

**Fig 4. Experimental validation of novel genes for heart development in zebrafish.**
**(A)** Tg(*flk1*:*EGFP*) zebrafish embryos injected with morpholinos (MOs) for novel candidate genes for CAD showed morphological heart abnormalities, such as peripheral edema at 3 days post-fertilization (arrows in the left panel, scale bar = 500 μm). Zebrafish embryos normally have hearts with a left ventricle (V) and right atrium (A), whereas the embryos injected with MOs related to CAD genes exhibited either no asymmetry or reversed V and A orientation (middle panels, scale bar = 200 μm). These embryos also exhibited malformed blood vessels in the trunk (asterisks in the right panel, scale bar = 200 μm). **(B)** MO-injected Tg(*flk1*:*EGFP*) zebrafish embryos were counted to quantify those that exhibited heart asymmetry. **(C)** MO-injected Tg(*flk1*:*EGFP*) zebrafish embryos were counted to quantify those that exhibited vascular defects. Over 20 MO-injected embryos per gene were counted for each analysis **(A-C)**.

See this image and copyright information in PMC

Cited by

BiomeNet: a database for construction and analysis of functional interaction networks for any species with a sequenced genome.
Kim E, Bae D, Yang S, Ko G, Lee S, Lee B, Lee I. Kim E, et al. Bioinformatics. 2020 Mar 1;36(5):1584-1589. doi: 10.1093/bioinformatics/btz776. Bioinformatics. 2020. PMID: 31599923 Free PMC article.
Germline gene fusions across species reveal the chromosomal instability regions and cancer susceptibility.
Zhou BW, Wu QQ, Mauki DH, Wang X, Zhang SR, Yin TT, Chen FL, Li C, Liu YH, Wang GD, Zhang YP. Zhou BW, et al. iScience. 2023 Nov 10;26(12):108431. doi: 10.1016/j.isci.2023.108431. eCollection 2023 Dec 15. iScience. 2023. PMID: 38205119 Free PMC article.
Disease gene prediction with privileged information and heteroscedastic dropout.
Shu J, Li Y, Wang S, Xi B, Ma J. Shu J, et al. Bioinformatics. 2021 Jul 12;37(Suppl_1):i410-i417. doi: 10.1093/bioinformatics/btab310. Bioinformatics. 2021. PMID: 34252957 Free PMC article.
Heterogeneous network approaches to protein pathway prediction.
Nayar G, Altman RB. Nayar G, et al. Comput Struct Biotechnol J. 2024 Jun 27;23:2727-2739. doi: 10.1016/j.csbj.2024.06.022. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 39035835 Free PMC article. Review.
Protein structural domain-disease association prediction based on heterogeneous networks.
Zhang J, Deng L, Deng L. Zhang J, et al. BMC Genomics. 2025 Apr 10;23(Suppl 6):869. doi: 10.1186/s12864-024-11117-0. BMC Genomics. 2025. PMID: 40211147 Free PMC article.

See all "Cited by" articles

References

1. Moore AD, Bjorklund AK, Ekman D, Bornberg-Bauer E, Elofsson A. Arrangements in the modular evolution of proteins. Trends Biochem Sci. 2008;33(9):444–51. 10.1016/j.tibs.2008.05.008 . - DOI - PubMed
1. Chothia C, Gough J, Vogel C, Teichmann SA. Evolution of the protein repertoire. Science. 2003;300(5626):1701–3. 10.1126/science.1085371 . - DOI - PubMed
1. Fang H, Gough J. DcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more. Nucleic acids research. 2013;41(Database issue):D536–44. 10.1093/nar/gks1080 - DOI - PMC - PubMed
1. Hegyi H, Gerstein M. Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. Genome research. 2001;11(10):1632–40. 10.1101/gr.183801 - DOI - PMC - PubMed
1. Clark WT, Radivojac P. Analysis of protein function and its prediction from amino acid sequence. Proteins. 2011;79(7):2086–96. 10.1002/prot.23029 . - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pathway-specific protein domains are predictive for human diseases

Affiliations

Pathway-specific protein domains are predictive for human diseases

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Molecular Biology Databases