. 2013 May 7:14:154.

doi: 10.1186/1471-2105-14-154.

Reconstituting protein interaction networks using parameter-dependent domain-domain interactions

Vesna Memišević¹, Anders Wallqvist, Jaques Reifman

Affiliations

Affiliation

¹ Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD 21702, USA.

PMID: 23651452
PMCID: PMC3660195
DOI: 10.1186/1471-2105-14-154

Reconstituting protein interaction networks using parameter-dependent domain-domain interactions

Vesna Memišević et al. BMC Bioinformatics. 2013.

. 2013 May 7:14:154.

doi: 10.1186/1471-2105-14-154.

Authors

Vesna Memišević¹, Anders Wallqvist, Jaques Reifman

Affiliation

¹ Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD 21702, USA.

PMID: 23651452
PMCID: PMC3660195
DOI: 10.1186/1471-2105-14-154

Abstract

Background: We can describe protein-protein interactions (PPIs) as sets of distinct domain-domain interactions (DDIs) that mediate the physical interactions between proteins. Experimental data confirm that DDIs are more consistent than their corresponding PPIs, lending support to the notion that analyses of DDIs may improve our understanding of PPIs and lead to further insights into cellular function, disease, and evolution. However, currently available experimental DDI data cover only a small fraction of all existing PPIs and, in the absence of structural data, determining which particular DDI mediates any given PPI is a challenge.

Results: We present two contributions to the field of domain interaction analysis. First, we introduce a novel computational strategy to merge domain annotation data from multiple databases. We show that when we merged yeast domain annotations from six annotation databases we increased the average number of domains per protein from 1.05 to 2.44, bringing it closer to the estimated average value of 3. Second, we introduce a novel computational method, parameter-dependent DDI selection (PADDS), which, given a set of PPIs, extracts a small set of domain pairs that can reconstruct the original set of protein interactions, while attempting to minimize false positives. Based on a set of PPIs from multiple organisms, our method extracted 27% more experimentally detected DDIs than existing computational approaches.

Conclusions: We have provided a method to merge domain annotation data from multiple sources, ensuring large and consistent domain annotation for any given organism. Moreover, we provided a method to extract a small set of DDIs from the underlying set of PPIs and we showed that, in contrast to existing approaches, our method was not biased towards DDIs with low or high occurrence counts. Finally, we used these two methods to highlight the influence of the underlying annotation density on the characteristics of extracted DDIs. Although increased annotations greatly expanded the possible DDIs, the lack of knowledge of the true biological false positive interactions still prevents an unambiguous assignment of domain interactions responsible for all protein network interactions.Executable files and examples are given at: http://www.bhsai.org/downloads/padds/

PubMed Disclaimer

Figures

**Figure 1**
**Evaluation of different protein-domain annotation merging strategies.** (A) Using the InterPro database, we obtained seven protein-domain annotations for yeast protein YNL271C from three databases: PFAM [32], Superfamily (SF) [33], and SMART [34,35]. PFAM domains: FH2, Drf_FH3, and two *Drf_GBD* domains; SF domains: *Formin homology 2 domain (FH2 domain)* and *ARM repeat*; and SMART domain: *Formin Homology*. (B) The naïve domain-merging strategy identified seven unique domains for YNL271C. (C) Sequence locations helped identify some of the identical domains (*FH2*, *FH2 domain*, and *Formin Homology*) but was not able to differentiate between different domains that share the same sequence position. (D) Taking into consideration both sequence location and domain names/labels, our merging strategy identified four unique domains: *ARM repeat*, *Drf_FH3*, *Drf_GBD*, and a domain consisting of *FH2* domains (*FH2*, *FH2 domain,* and *Formin Homology*).

**Figure 2**
**Enrichment of** ***“known”*** **(iPFAM) domain-domain interactions.** Evaluation of the top-scoring domain-domain interactions (DDIs) extracted by the *parameter-dependent DDI selection* (PADDS) and the *generalized parsimonious explanation* (GPE). (A) The fraction of known DDIs in the iPFAM database [38] retrieved by PADDS as a function of α and the number of top-scoring DDIs. (B) Comparison of the percentage of retrieved iPFAM DDIs using PADDS and GPE as a function of top-ranked DDI sets (*i.e.*, recall). (C) Comparison of the fraction of retrieved iPFAM DDIs using PADDS and GPE as a function of the iPFAM DDI set and top-ranked DDI sets (*i.e.*, precision). For the GPE sets, we used the DDI rank information provided with the published data that includes their designated high-confidence (GPE-HC) and low-confidence (GPE-LC) sets [21]. We have also indicated the best results achievable with any α value, typically achieved for *α = 0.1*.

**Figure 3**
**Overlap between extracted domain-domain interaction sets for different values of parameter** α. The graph indicate fractional overlaps between sets of extracted domain-domain interactions (DDIs) for the six different domain annotation schemes defined in Table 2, for different sets of α values. As the underlying set of PPIs, we used a high-confidence yeast PPI data set created by the Interaction Detection Based On Shuffling (IDBOS) procedure at a 5% false discovery rate [8,41].

**Figure 4**
**Protein-domain annotation merging procedure.** An illustration of the computational procedure used to merge protein-domain annotation data from multiple databases for a single protein P (consisting of n amino acids) and domain annotation data from three databases: DB1, DB2, and DB3. INPUT: Protein sequences and protein-domain annotations from one or more databases. PROCESSING: The annotation data were merged in three consecutive steps. In Step I, tandem domains within each protein (and for each database) were merged and represented as a continuous domain with the same domain label as the tandem domains. In Step II, annotation data between all pairs of databases were merged. In Step III, all pairs from Step II were merged into a final annotation set. In this step, new domain labels were assigned to the sets of merged domains. OUTPUT: The output of the annotation merging procedure consists of 1) a set of new (merged) domain labels assigned to the protein, 2) a mapping between the new and original domain labels, and 3) a list of merging exceptions. Based on these lists, one may (re)define sets of labels that should be treated as equivalent or non-equivalent and iterate through the complete domain annotation merging procedure (ITERATION).

**Figure 5**
**Example of domain-domain interaction extraction.** I: Given a set of protein-protein interactions (PPIs) and a protein-domain annotation scheme, PADDS transformed all PPIs into the corresponding set of domain-domain interactions (DDIs) and calculated the benefit value B_ij for all DDIs. II: The five steps involved in the DDI iterative evaluation procedure is illustrated using interactions between domains D1 and D3. **III**: After PADDS performed the DDI evaluation procedure for all other DDIs, the results were examined to select the final set of DDIs that can reconstitute the PPIs. P1, …, P7 denote proteins and D1, …, D8 denote domains. The benefit B_ij and the reassessed benefit $B_{ij}^{r}$ associated with the interaction between domains ij were calculated using Equations (3) and (4), respectively.

See this image and copyright information in PMC

References

1. Hart GT, Ramani AK, Marcotte EM. How complete are current yeast and human protein-interaction networks? Genome Biol. 2006;7(11):120. doi: 10.1186/gb-2006-7-11-120. - DOI - PMC - PubMed
1. Sambourg L, Thierry-Mieg N. New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size. BMC Bioinformatics. 2010;11:605. doi: 10.1186/1471-2105-11-605. - DOI - PMC - PubMed
1. Stumpf MP, Thorne T, de Silva E, Stewart R, An HJ, Lappe M, Wiuf C. Estimating the size of the human interactome. Proc Natl Acad Sci USA. 2008;105(19):6959–6964. doi: 10.1073/pnas.0708078105. - DOI - PMC - PubMed
1. von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P. Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002;417(6887):399–403. - PubMed
1. Yu H, Braun P, Yildirim MA, Lemmens I, Venkatesan K, Sahalie J, Hirozane-Kishikawa T, Gebreab F, Li N, Simonis N. High-quality binary protein interaction map of the yeast interactome network. Science. 2008;322(5898):104–110. doi: 10.1126/science.1158684. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reconstituting protein interaction networks using parameter-dependent domain-domain interactions

Affiliation

Reconstituting protein interaction networks using parameter-dependent domain-domain interactions

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases