. 2019 Aug 28;20(1):446.

doi: 10.1186/s12859-019-3036-6.

Measuring rank robustness in scored protein interaction networks

Lyuba V Bozhilova¹, Alan V Whitmore², Jonny Wray², Gesine Reinert¹, Charlotte M Deane³

Affiliations

¹ Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK.
² e-Therapeutics Plc, 17 Fenlock Rd, Long Hanborough, OX29 8LN, UK.
³ Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK. deane@stats.ox.ac.uk.

PMID: 31462221
PMCID: PMC6714100
DOI: 10.1186/s12859-019-3036-6

Measuring rank robustness in scored protein interaction networks

Lyuba V Bozhilova et al. BMC Bioinformatics. 2019.

. 2019 Aug 28;20(1):446.

doi: 10.1186/s12859-019-3036-6.

Authors

Lyuba V Bozhilova¹, Alan V Whitmore², Jonny Wray², Gesine Reinert¹, Charlotte M Deane³

Affiliations

¹ Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK.
² e-Therapeutics Plc, 17 Fenlock Rd, Long Hanborough, OX29 8LN, UK.
³ Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK. deane@stats.ox.ac.uk.

PMID: 31462221
PMCID: PMC6714100
DOI: 10.1186/s12859-019-3036-6

Abstract

Background: Protein interaction databases often provide confidence scores for each recorded interaction based on the available experimental evidence. Protein interaction networks (PINs) are then built by thresholding on these scores, so that only interactions of sufficiently high quality are included. These networks are used to identify biologically relevant motifs or nodes using metrics such as degree or betweenness centrality. This type of analysis can be sensitive to the choice of threshold. If a node metric is to be useful for extracting biological signal, it should induce similar node rankings across PINs obtained at different reasonable confidence score thresholds.

Results: We propose three measures-rank continuity, identifiability, and instability-to evaluate how robust a node metric is to changes in the score threshold. We apply our measures to twenty-five metrics and identify four as the most robust: the number of edges in the step-1 ego network, as well as the leave-one-out differences in average redundancy, average number of edges in the step-1 ego network, and natural connectivity. Our measures show good agreement across PINs from different species and data sources. Analysis of synthetically generated scored networks shows that robustness results are context-specific, and depend both on network topology and on how scores are placed across network edges.

Conclusion: Due to the uncertainty associated with protein interaction detection, and therefore network structure, for PIN analysis to be reproducible, it should yield similar results across different confidence score thresholds. We demonstrate that while certain node metrics are robust with respect to threshold choice, this is not always the case. Promisingly, our results suggest that there are some metrics that are robust across networks constructed from different databases, and different scoring procedures.

Keywords: Confidence scores; Protein interaction networks; Protein-protein interactions; Ranking; Robustness.

PubMed Disclaimer

Conflict of interest statement

AVW and JW are employed by e-Therapeutics plc. They contributed to data acquisition and node metric choice, and critically reviewed the manuscript. The company did not play any other role in the work presented here.

Figures

**Fig. 1**
Thresholding effects in STRING networks. Average degree (a) and average local clustering coefficient (b) as functions of the threshold in the three STRING networks. The dotted vertical lines correspond to the four default STRING threshold values

**Fig. 2**
Metric rank similarity between consecutive thresholds. a In the four PINs, metrics were either consistently stable (e.g. degree and LOUD natural connectivity), consistently unstable (e.g. local clustering coefficient), or showed decreasing stability (e.g. betweenness). b The synthetic network based on a randomly rescored subset of the PVX network, SYN-PVX, and the network based on a Bernoulli random graph, SYN-GNP, exhibited different behaviour, with metrics showing the least similarity across thresholds in the SYN-GNP network

**Fig. 3**
Relaxed similarity between overall and threshold ranks in the scored PINs. a The overall ranks have been calculated over the medium-high confidence regions—0.60 to 0.90 for the three STRING networks (black dotted lines) and 0.15 to 0.28 for the HitPredict network (pink dotted lines). b The STRING medium-high confidence interval was also used for the synthetic networks. The SYN-GNP network, where both structure and score allocation are uniform, exhibits lower relaxed similarity. The SYN-PVX network has inherently heterogeneous network structure, on which scores are assigned randomly. Score thresholding introduces the same rate of change in different parts of the network, so that the relative node degree, for example, may remain largely unchanged across a series of thresholds. In contrast, the heterogeneous score allocation in PINs makes rank reorderings more likely, and identifiability may be expected to be lower

**Fig. 4**
Rank instability of metrics in the scored networks. a Rank instability in the four PINs. The dotted lines correspond to 1%. Instability measures in the HPRED network were generally narrower. b Rank instability in the synthetic networks. Instability measures in PINs were generally lower and have been plotted for comparison. Note the different scales between plots in a and in b

**Fig. 5**
Confidence score distributions in each of the four studied PINs. Bin width in all four cases has been set to 0.01. Scores from the HitPredict network (bottom right) follow a different distribution and cannot necessarily be interpreted in the same way as STRING scores

**Fig. 6**
Thresholding scored networks. A scored network, with edge widths corresponding to confidence scores (left). At a low threshold, only the lowest scoring edge CD is removed (middle). At a higher threshold, only the highest scoring edges AB and BC remain in the network (right). Edge scores are otherwise ignored in the thresholded networks

See this image and copyright information in PMC

References

1. Barabasi A-L, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5(2):101. - PubMed
1. Vidal M. A unifying view of 21st century systems biology. FEBS Lett. 2009;583(24):3891–4. - PubMed
1. Vidal M, Cusick ME, Barabási A-L. Interactome networks and human disease. Cell. 2011;144(6):986–98. - PMC - PubMed
1. Young MP, Zimmer S, Whitmore AV. Drug molecules and biology: Network and systems aspects. Designing multi-target drugs. Cambridge: Royal Society of Chemistry; 2012.
1. Csermely P, Korcsmáros T, Kiss HJ, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther. 2013;138(3):333–408. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

EP/L016044/1/Engineering and Physical Sciences Research Council

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Measuring rank robustness in scored protein interaction networks

Affiliations

Measuring rank robustness in scored protein interaction networks

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources