Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Mar;5(3):e1000335.
doi: 10.1371/journal.pcbi.1000335. Epub 2009 Mar 27.

Accurate prediction of peptide binding sites on protein surfaces

Affiliations

Accurate prediction of peptide binding sites on protein surfaces

Evangelia Petsalaki et al. PLoS Comput Biol. 2009 Mar.

Abstract

Many important protein-protein interactions are mediated by the binding of a short peptide stretch in one protein to a large globular segment in another. Recent efforts have provided hundreds of examples of new peptides binding to proteins for which a three-dimensional structure is available (either known experimentally or readily modeled) but where no structure of the protein-peptide complex is known. To address this gap, we present an approach that can accurately predict peptide binding sites on protein surfaces. For peptides known to bind a particular protein, the method predicts binding sites with great accuracy, and the specificity of the approach means that it can also be used to predict whether or not a putative or predicted peptide partner will bind. We used known protein-peptide complexes to derive preferences, in the form of spatial position specific scoring matrices, which describe the binding-site environment in globular proteins for each type of amino acid in bound peptides. We then scan the surface of a putative binding protein for sites for each of the amino acids present in a peptide partner and search for combinations of high-scoring amino acid sites that satisfy constraints deduced from the peptide sequence. The method performed well in a benchmark and largely agreed with experimental data mapping binding sites for several recently discovered interactions mediated by peptides, including RG-rich proteins with SMN domains, Epstein-Barr virus LMP1 with TRADD domains, DBC1 with Sir2, and the Ago hook with Argonaute PIWI domain. The method, and associated statistics, is an excellent tool for predicting and studying binding sites for newly discovered peptides mediating critical events in biology.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Overview of the method.
(A) A training dataset of protein–peptide complexes is extracted from the Protein Data Bank . (B) The peptide residues are superimposed along with their associated binding environments. (C) Spatial Position Specific Scoring Matrices (S-PSSMs) are created based on the spatial distribution of 14 defined atom types (Table S3) in the binding site of each residue. compared to background protein surfaces sites (D) S-PSSMs corresponding to residues in a query peptide (FxPRD) are then scanned over the surface of the protein. (E) Potential binding sites for each residue of the query peptide are identified, which are then combined using the distance constraints dictated by the peptide sequence. (F) The binding site for the complete peptide is predicted and scored.
Figure 2
Figure 2. ROC curve showing performance in the large benchmark.
False positive rate (X axis) plotted against true positive rate (Y) for different p-value cut-offs. False positive predictions are defined as those that either have predicted the wrong binding site or have predicted a binding site for a peptide that is not known to bind. The figure shows the result for our approach (pepsite) at two distance thresholds defining accuracy (6 Å & 10 Å), and for 10 Å with a subset of proteins smaller than 100 amino acids. Equivalent values for rate4site on the same datasets are also shown as well as the ROC curve for pepsite using a stricter cross-validation (i.e., excluding similarities/homologies between proteins as given in the SCOP database).
Figure 3
Figure 3. Examples of applying the method.
Predicted peptides are depicted as spheres on the protein surface colored by amino acid type (prolines – pink, alanines and glycines - white, serines - orange, asparagines and glutamines - teal and aspartic/glutamic acid – red). (A) Binding of a collagen peptide (GPAGPPGA) on a human matrix metalloproteinase 2 (1eak). The peptide bound in the solved X-ray structure is colored in red. Note the predicted binding site differs however it is likely correct (see text). (B) Binding of the Ago hook peptide (PDNGTSAWGEPNESSPGWGEMD) on the PIWI domain of the Argonaute protein (PDB IDs: 1ytu ; 1w9h [39]): i) the best, though incorrect binding site; ii) the location of the other top scoring predictions (correct). (C) Prediction for the binding of an RGRGRGRG peptide to the human SMN tudor domain (PDB ID: 1mhn [40]), which agrees with NMR data. (D) Prediction of the leucine zipper (helical region 243–264) of the DBC1 sequence binding site on the catalytic domain of SIRT1 (PDB ID: 1m2g [42]) (E) Prediction for the binding of the LMP1 protein of the Epstein-Barr virus peptide DDPHGPVQLS on the TRADD protein (PDB ID: 1f2h [45]).
Figure 4
Figure 4. Using the method to scan for regions in Sec31 likely to bind Sec23.
(A) Predictions for the most conserved region of the Sec31 disordered 40 residue peptide segment (GPQNGWNDPPAL) on the Sec23/Sar1 complex. In red is the region of the peptide from the solved structure (PDB IDs: 2qtv , 1m2o [48]). (B) P-values (Y-axis) for each 12 residue peptides from residues 770 to 1100 of the Sec31 protein (X-axis) to identify the binding region. The lowest p-values, in the region 965–1010, are very close to the known binding site (981–1021). The black line under the graph shows the actual binding 40 residue peptide and the region colored in red-brown corresponds to the peptide predicted to bind shown in (A) of this figure.

Similar articles

Cited by

References

    1. Diella F, Haslam N, Chica C, Budd A, Michael S, et al. Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front Biosci. 2008;13:6580–6603. - PubMed
    1. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. the roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272:5129–5148. - PubMed
    1. Haynes C, Oldfield CJ, Ji F, Klitgord N, Cusick ME, et al. Intrinsic disorder is a common feature of hub proteins from four eukaryotic interactomes. PLoS Comput Biol. 2006;2:e100. doi:10.1371/journal.pcbi.0020100. - PMC - PubMed
    1. Puntervoll P, Linding R, Gemünd C, Chabanis-Davidson S, Mattingsdal M, et al. ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003;31:3625–3630. - PMC - PubMed
    1. Neduva V, Russell RB. Peptides mediating interaction networks: new leads at last. Curr Opin Biotechnol. 2006;17:465–471. - PubMed

Publication types