. 2009 Mar;5(3):e1000335.

doi: 10.1371/journal.pcbi.1000335. Epub 2009 Mar 27.

Accurate prediction of peptide binding sites on protein surfaces

Evangelia Petsalaki¹, Alexander Stark, Eduardo García-Urdiales, Robert B Russell

Affiliations

PMID: 19325869
PMCID: PMC2653190
DOI: 10.1371/journal.pcbi.1000335

Accurate prediction of peptide binding sites on protein surfaces

Evangelia Petsalaki et al. PLoS Comput Biol. 2009 Mar.

. 2009 Mar;5(3):e1000335.

doi: 10.1371/journal.pcbi.1000335. Epub 2009 Mar 27.

Authors

Evangelia Petsalaki¹, Alexander Stark, Eduardo García-Urdiales, Robert B Russell

Affiliation

¹ European Molecular Biology Laboratory, Heidelberg, Germany.

PMID: 19325869
PMCID: PMC2653190
DOI: 10.1371/journal.pcbi.1000335

Abstract

Many important protein-protein interactions are mediated by the binding of a short peptide stretch in one protein to a large globular segment in another. Recent efforts have provided hundreds of examples of new peptides binding to proteins for which a three-dimensional structure is available (either known experimentally or readily modeled) but where no structure of the protein-peptide complex is known. To address this gap, we present an approach that can accurately predict peptide binding sites on protein surfaces. For peptides known to bind a particular protein, the method predicts binding sites with great accuracy, and the specificity of the approach means that it can also be used to predict whether or not a putative or predicted peptide partner will bind. We used known protein-peptide complexes to derive preferences, in the form of spatial position specific scoring matrices, which describe the binding-site environment in globular proteins for each type of amino acid in bound peptides. We then scan the surface of a putative binding protein for sites for each of the amino acids present in a peptide partner and search for combinations of high-scoring amino acid sites that satisfy constraints deduced from the peptide sequence. The method performed well in a benchmark and largely agreed with experimental data mapping binding sites for several recently discovered interactions mediated by peptides, including RG-rich proteins with SMN domains, Epstein-Barr virus LMP1 with TRADD domains, DBC1 with Sir2, and the Ago hook with Argonaute PIWI domain. The method, and associated statistics, is an excellent tool for predicting and studying binding sites for newly discovered peptides mediating critical events in biology.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Overview of the method.**
(A) A training dataset of protein–peptide complexes is extracted from the Protein Data Bank . (B) The peptide residues are superimposed along with their associated binding environments. (C) Spatial Position Specific Scoring Matrices (S-PSSMs) are created based on the spatial distribution of 14 defined atom types (Table S3) in the binding site of each residue. compared to background protein surfaces sites (D) S-PSSMs corresponding to residues in a query peptide (FxPRD) are then scanned over the surface of the protein. (E) Potential binding sites for each residue of the query peptide are identified, which are then combined using the distance constraints dictated by the peptide sequence. (F) The binding site for the complete peptide is predicted and scored.

**Figure 2. ROC curve showing performance in the large benchmark.**
False positive rate (X axis) plotted against true positive rate (Y) for different p-value cut-offs. False positive predictions are defined as those that either have predicted the wrong binding site or have predicted a binding site for a peptide that is not known to bind. The figure shows the result for our approach (pepsite) at two distance thresholds defining accuracy (6 Å & 10 Å), and for 10 Å with a subset of proteins smaller than 100 amino acids. Equivalent values for rate4site on the same datasets are also shown as well as the ROC curve for pepsite using a stricter cross-validation (i.e., excluding similarities/homologies between proteins as given in the SCOP database).

**Figure 3. Examples of applying the method.**
Predicted peptides are depicted as spheres on the protein surface colored by amino acid type (prolines – pink, alanines and glycines - white, serines - orange, asparagines and glutamines - teal and aspartic/glutamic acid – red). (A) Binding of a collagen peptide (GPAGPPGA) on a human matrix metalloproteinase 2 (1eak). The peptide bound in the solved X-ray structure is colored in red. Note the predicted binding site differs however it is likely correct (see text). (B) Binding of the Ago hook peptide (PDNGTSAWGEPNESSPGWGEMD) on the PIWI domain of the Argonaute protein (PDB IDs: 1ytu ; 1w9h [39]): i) the best, though incorrect binding site; ii) the location of the other top scoring predictions (correct). (C) Prediction for the binding of an RGRGRGRG peptide to the human SMN tudor domain (PDB ID: 1mhn [40]), which agrees with NMR data. (D) Prediction of the leucine zipper (helical region 243–264) of the DBC1 sequence binding site on the catalytic domain of SIRT1 (PDB ID: 1m2g [42]) (E) Prediction for the binding of the LMP1 protein of the Epstein-Barr virus peptide DDPHGPVQLS on the TRADD protein (PDB ID: 1f2h [45]).

**Figure 4. Using the method to scan for regions in Sec31 likely to bind Sec23.**
(A) Predictions for the most conserved region of the Sec31 disordered 40 residue peptide segment (GPQNGWNDPPAL) on the Sec23/Sar1 complex. In red is the region of the peptide from the solved structure (PDB IDs: 2qtv , 1m2o [48]). (B) P-values (Y-axis) for each 12 residue peptides from residues 770 to 1100 of the Sec31 protein (X-axis) to identify the binding region. The lowest p-values, in the region 965–1010, are very close to the known binding site (981–1021). The black line under the graph shows the actual binding 40 residue peptide and the region colored in red-brown corresponds to the peptide predicted to bind shown in (A) of this figure.

See this image and copyright information in PMC

Cited by

Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction.
Yin S, Mi X, Shukla D. Yin S, et al. ArXiv [Preprint]. 2024 Feb 7:arXiv:2310.18249v2. ArXiv. 2024. Update in: RSC Chem Biol. 2024 Mar 13;5(5):401-417. doi: 10.1039/d3cb00208j. PMID: 37961736 Free PMC article. Updated. Preprint.
Methods for Molecular Modelling of Protein Complexes.
Kanitkar TR, Sen N, Nair S, Soni N, Amritkar K, Ramtirtha Y, Madhusudhan MS. Kanitkar TR, et al. Methods Mol Biol. 2021;2305:53-80. doi: 10.1007/978-1-0716-1406-8_3. Methods Mol Biol. 2021. PMID: 33950384 Review.
Modelling binding between CCR5 and CXCR4 receptors and their ligands suggests the surface electrostatic potential of the co-receptor to be a key player in the HIV-1 tropism.
Kalinina OV, Pfeifer N, Lengauer T. Kalinina OV, et al. Retrovirology. 2013 Nov 11;10:130. doi: 10.1186/1742-4690-10-130. Retrovirology. 2013. PMID: 24215935 Free PMC article.
Modular binder technology by NGS-aided, high-resolution selection in yeast of designed armadillo modules.
Stark Y, Menard F, Jeliazkov JR, Ernst P, Chembath A, Ashraf M, Hine AV, Plückthun A. Stark Y, et al. Proc Natl Acad Sci U S A. 2024 Jul 2;121(27):e2318198121. doi: 10.1073/pnas.2318198121. Epub 2024 Jun 25. Proc Natl Acad Sci U S A. 2024. PMID: 38917007 Free PMC article.
The basic keratin 10-binding domain of the virulence-associated pneumococcal serine-rich protein PsrP adopts a novel MSCRAMM fold.
Schulte T, Löfling J, Mikaelsson C, Kikhney A, Hentrich K, Diamante A, Ebel C, Normark S, Svergun D, Henriques-Normark B, Achour A. Schulte T, et al. Open Biol. 2014 Jan 15;4(1):130090. doi: 10.1098/rsob.130090. Open Biol. 2014. PMID: 24430336 Free PMC article.

See all "Cited by" articles

References

1. Diella F, Haslam N, Chica C, Budd A, Michael S, et al. Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front Biosci. 2008;13:6580–6603. - PubMed
1. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. the roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272:5129–5148. - PubMed
1. Haynes C, Oldfield CJ, Ji F, Klitgord N, Cusick ME, et al. Intrinsic disorder is a common feature of hub proteins from four eukaryotic interactomes. PLoS Comput Biol. 2006;2:e100. doi:10.1371/journal.pcbi.0020100. - PMC - PubMed
1. Puntervoll P, Linding R, Gemünd C, Chabanis-Davidson S, Mattingsdal M, et al. ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003;31:3625–3630. - PMC - PubMed
1. Neduva V, Russell RB. Peptides mediating interaction networks: new leads at last. Curr Opin Biotechnol. 2006;17:465–471. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accurate prediction of peptide binding sites on protein surfaces

Affiliation

Accurate prediction of peptide binding sites on protein surfaces

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous