Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 26;5(1):503.
doi: 10.1038/s42003-022-03445-2.

PepNN: a deep attention model for the identification of peptide binding sites

Affiliations

PepNN: a deep attention model for the identification of peptide binding sites

Osama Abdin et al. Commun Biol. .

Abstract

Protein-peptide interactions play a fundamental role in many cellular processes, but remain underexplored experimentally and difficult to model computationally. Here, we present PepNN-Struct and PepNN-Seq, structure and sequence-based approaches for the prediction of peptide binding sites on a protein. A main difficulty for the prediction of peptide-protein interactions is the flexibility of peptides and their tendency to undergo conformational changes upon binding. Motivated by this, we developed reciprocal attention to simultaneously update the encodings of peptide and protein residues while enforcing symmetry, allowing for information flow between the two inputs. PepNN integrates this module with modern graph neural network layers and a series of transfer learning steps are used during training to compensate for the scarcity of peptide-protein complex information. We show that PepNN-Struct achieves consistently high performance across different benchmark datasets. We also show that PepNN makes reasonable peptide-agnostic predictions, allowing for the identification of novel peptide binding proteins.

PubMed Disclaimer

Conflict of interest statement

P.M.K. is a co-founder and has been consultant to several biotechnology ventures, including Resolute Bio, Oracle Therapeutics and Navega Therapeutics and serves on the scientific advisory board of ProteinQure. He also holds several patents in the area of protein and peptide engineering. O.A., S.N. and H.W. declare no competing interests.

Figures

Fig. 1
Fig. 1. Model architecture and training procedure.
a Attention layers are indicated with orange; normalization layers are indicated with blue and simple transformation layers are indicated with green. b Input layers for PepNN-Seq. c Transfer learning pipeline used for model training.
Fig. 2
Fig. 2. Impact of transfer learning on model performance on the peptide complex validation dataset.
a ROC curves on all residues in the dataset using predictions from PepNN-Struct trained on different datasets with different sequence embeddings. Solid lines indicate models that use ProtBert embeddings. b Comparison of the distribution of ROC AUCs on different input proteins using predictions from PepNN-Struct trained on different datasets with different sequence embeddings (Wilcoxon signed-rank test, n = 311 protein-peptide complex structures). c ROC curves on all residues in the dataset using predictions from PepNN-Seq trained on different datasets with different sequence embeddings. Solid lines indicate models that use ProtBert embeddings. d Comparison of the distribution of ROC AUCs on different input proteins using predictions from PepNN-Seq trained on different datasets (Wilcoxon signed-rank test, n = 311 protein-peptide complex structures). e Predictions of the binding site of the SPOC domain of PHF3 (PDB code 6IC9) using PepNN-Struct trained on different datasets. f Relationship between the change in AUC ROC when PepNN-Struct is pretrained and the maximum TMalign score of chains in the test dataset with chains in the pre-training dataset. Boxplot centerlines show medians, box limits show upper and lower quartiles, whiskers are 1.5 the interquartile range and points show outliers.
Fig. 3
Fig. 3. Comparison of PepNN-Struct and a Graph Transformer.
a ROC curves on all residues in the TS092 dataset. b Comparison of distribution of ROC AUCs on different input proteins (Wilcoxon signed-rank test, n = 92 protein-peptide complexes). c Comparison of model performance on examples where bound peptides undergo conformational changes. d Prediction of the binding site of a P53 N-terminal peptide to RPA70N (PDB Code 2B3G). Unbound peptide conformation is shown in magenta (PDB Code 2LY4). Boxplot centerlines show medians, box limits show upper and lower quartiles, whiskers are 1.5 the interquartile range and points show outliers.
Fig. 4
Fig. 4. Peptide agnostic prediction using PepNN.
a ROC curves on the validation dataset using PepNN-Struct with different input peptide sequences. b ROC curves on the validation dataset using PepNN-Seq with different input peptide sequences. c Scores assigned by PepNN-Struct to different domains in the PDB (Wilcoxon rank-sum test, 56,756 total protein chains). d Scores assigned by the PepNN-Seq to different domains in the reference human proteome (Wilcoxon rank-sum test, 92,141 total proteins). e ORF7a peptide binding site prediction and key residues at the predicted binding site and an alternate binding site. f Co-immunoprecipitation of wild type and mutant ORF7A with BST-2. g Energies and relative energy contributions of different fragments calculated using Peptiderive on ORF7a/BST-2 docking poses. h Binding site prediction on ORF7a using PepNN and BST-2.

References

    1. Tompa P, Davey NE, Gibson TJ, Babu MM. A million peptide motifs for the molecular biologist. Mol. Cell. 2014 doi: 10.1016/j.molcel.2014.05.032. - DOI - PubMed
    1. Krumm BE, Grisshammer R. Peptide ligand recognition by G protein-coupled receptors. Front. Pharmacol. 2015;6:48. doi: 10.3389/fphar.2015.00048. - DOI - PMC - PubMed
    1. Cunningham JM, Koytiger G, Sorger PK, AlQuraishi M. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat. Methods. 2020;17:175–183. doi: 10.1038/s41592-019-0687-1. - DOI - PMC - PubMed
    1. Yang F, et al. Protein domain-level landscape of cancer-type-specific somatic mutations. PLoS Comput. Biol. 2015;11:1–30. - PMC - PubMed
    1. Hagai T, Azia A, Babu MM, Andino R. Use of host-like peptide motifs in viral proteins is a prevalent strategy in host-virus interactions. Cell Rep. 2014;7:1729–1739. doi: 10.1016/j.celrep.2014.04.052. - DOI - PMC - PubMed

Publication types

Grants and funding