Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Apr 30:2024.03.16.585347.
doi: 10.1101/2024.03.16.585347.

PPIscreenML: Structure-based screening for protein-protein interactions using AlphaFold

Affiliations

PPIscreenML: Structure-based screening for protein-protein interactions using AlphaFold

Victoria Mischley et al. bioRxiv. .

Abstract

Protein-protein interactions underlie nearly all cellular processes. With the advent of protein structure prediction methods such as AlphaFold2 (AF2), models of specific protein pairs can be built extremely accurately in most cases. However, determining the relevance of a given protein pair remains an open question. It is presently unclear how to use best structure-based tools to infer whether a pair of candidate proteins indeed interact with one another: ideally, one might even use such information to screen amongst candidate pairings to build up protein interaction networks. Whereas methods for evaluating quality of modeled protein complexes have been co-opted for determining which pairings interact (e.g., pDockQ and iPTM), there have been no rigorously benchmarked methods for this task. Here we introduce PPIscreenML, a classification model trained to distinguish AF2 models of interacting protein pairs from AF2 models of compelling decoy pairings. We find that PPIscreenML out-performs methods such as pDockQ and iPTM for this task, and further that PPIscreenML exhibits impressive performance when identifying which ligand/receptor pairings engage one another across the structurally conserved tumor necrosis factor superfamily (TNFSF). Analysis of benchmark results using complexes not seen in PPIscreenML development strongly suggest that the model generalizes beyond training data, making it broadly applicable for identifying new protein complexes based on structural models built with AF2.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Building a challenging training/testing set for PPIscreenML.
(A) A collection of 1,481 non-redundant active complexes with experimentally derived structures were obtained from DockGround, and five AF2 models were built from each of these. To build decoys, the same collection was screened to identify the closest structural matches (by TM-score) for each component protein. The structural homologs for each template protein were aligned onto the original complex, yielding a new (decoy) complex between two presumably non-interacting proteins. Five AF2 models were built for each of these 1,481 decoy complexes. (B) An example of a decoy complex (blue/cyan) superposed with the active complex from which is was generated (brown/wheat). (C) A suite of AlphaFold confidence metrics, structural properties, and Rosetta energy terms were used as input features for training PPIscreenML, a machine learning classifier built to distinguish active versus compelling inactive protein pairs.
Figure 2:
Figure 2:. Training and feature reduction for PPIscreenML.
(A) Receiver operating characteristic (ROC) plot demonstrating classification performance on a completely held-out test set, for an XGBoost model using 57 features. (B) The number of features was reduced using sequential backwards selection, from 57 features to 7 features. (C) Classification performance of PPIscreenML (7 features) on the same completely held-out test set.
Figure 3:
Figure 3:. Classification performance of PPIscreenML relative to pDockQ and iPTM.
The same test set is used here. These complexes were not seen in any phase of developing PPIscreenML, but may have been used in developing pDockQ or iPTM. (A) Receiver operating characteristic (ROC) plot shows superior performance of PPIscreenML relative to these other two methods. (B) Overlaid histograms show clear separation of actives and decoys scored using PPIscreenML. (C) Overlaid histograms show overlapping distributions when models are scored with pDockQ or iPTM.
Figure 4:
Figure 4:. Application of PPIscreenML to identify active pairings within the tumor necrosis factor superfamily (TNFSF).
(A) Structurally-conserved TNFSF ligands bind to structurally-conserved TNFSF receptors; AF2 builds models of these complexes in the canonical pose for cognate pairings (RANKL/RANK are shown in wheat/cyan) but also in some cases for non-cognate pairings (RANKL/CD40 are shown in brown/blue). (B) Each ligand/receptor pairing was built with AF2 and scored with PPIscreenML (heatmap colored from low score in red, to high score in green). Ligand/receptor pairings observed in a comprehensive cellular assay are indicated with white checkmarks. (C) Receiver operating characteristic (ROC) plot demonstrating PPIscreenML classification of TNFSF ligand/receptor pairings.

Similar articles

References

    1. Choi SG, Olivet J, Cassonnet P, Vidalain PO, Luck K, Lambourne L, Spirohn K, Lemmens I, Dos Santos M, Demeret C, Jones L, Rangarajan S, Bian W, Coutant EP, Janin YL, van der Werf S, Trepte P, Wanker EE, De Las Rivas J, Tavernier J, Twizere JC, Hao T, Hill DE, Vidal M, Calderwood MA, Jacob Y. Maximizing binary interactome mapping with a minimal number of assays. Nat Commun. 2019; 10:3907. - PMC - PubMed
    1. Venkatesan K, Rual JF, Vazquez A, Stelzl U, Lemmens I, Hirozane-Kishikawa T, Hao T, Zenkner M, Xin X, Goh KI, Yildirim MA, Simonis N, Heinzmann K, Gebreab F, Sahalie JM, Cevik S, Simon C, de Smet AS, Dann E, Smolyar A, Vinayagam A, Yu H, Szeto D, Borick H, Dricot A, Klitgord N, Murray RR, Lin C, Lalowski M, Timm J, Rau K, Boone C, Braun P, Cusick ME, Roth FP, Hill DE, Tavernier J, Wanker EE, Barabasi AL, Vidal M. An empirical framework for binary interactome mapping. Nat Methods. 2009; 6:83–90. - PMC - PubMed
    1. Elhabashy H, Merino F, Alva V, Kohlbacher O, Lupas AN. Exploring protein-protein interactions at the proteome level. Structure. 2022; 30:462–75. - PubMed
    1. Fields S, Song O. A novel genetic system to detect protein-protein interactions. Nature. 1989; 340:245–6. - PubMed
    1. Dunham WH, Mullin M, Gingras AC. Affinity-purification coupled to mass spectrometry: basic principles and strategies. Proteomics. 2012; 12:1576–90. - PubMed

Publication types