Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 15;12(1):5465.
doi: 10.1038/s41467-021-25772-4.

A deep-learning framework for multi-level peptide-protein interaction prediction

Affiliations

A deep-learning framework for multi-level peptide-protein interaction prediction

Yipin Lei et al. Nat Commun. .

Abstract

Peptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.

PubMed Disclaimer

Conflict of interest statement

J.Z. is the founder of Silexon AI Technology Co., Ltd. and has an equity interest. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The workflow and architecture of CAMP.
a Workflow of data curation and label extraction. We first extracted all PDB complexes containing peptides as ligands from the RCSB PDB, and all peptide drugs with corresponding targets from DrugBank. Then for the peptide–protein pairs from the PDB, we used PLIP to identify the interacting pairs by detecting whether there existed non-covalent interactions between them. Next, we generated sequence-based feature profiles for peptides and proteins, including residue-level structural and physicochemical properties, intrinsic disorder tendencies of peptides and proteins, and protein evolutionary information. We also downloaded the corresponding labels of peptide-binding residues from PepBDB. Such residue-level labels and pairwise binary interactions were regarded as the multi-level supervised information for CAMP. b Network architecture of CAMP. Given the peptide feature profiles and the protein profiles of an input pair, the numerical features, i.e., the evolutionary protein PSSM and the intrinsic disorder tendency of each residue in the peptide or protein sequence are processed by the numerical channels of the feature extractors. The categorical features, i.e., the raw amino acids, secondary structures, polarity, and hydropathy properties of the peptide or protein are processed by three categorical channels. Next, the outputs of these channels are concatenated together and then fed into CNN modules, and the outputs of the amino-acid representations of the peptide and the protein are also fed into self-attention modules to learn the importance of individual residues (i.e., the contributions of individual residues to the final prediction). After that, the outputs of self-attention modules and CNN modules are concatenated together to predict a binding score for each peptide–protein pair through three fully connected layers and a binding score for each residue from the peptide sequence using the output of the CNN module of the peptide.
Fig. 2
Fig. 2. AUC and AUPR of CAMP and baseline models through cross-validation under three settings.
a, b show the AUC and AUPR of CAMP and other baseline methods under the “novel protein setting”, respectively. c, d show the AUC and AUPR of CAMP and other baseline methods under the “novel peptide setting”, respectively. e, f show the AUC and AUPR of CAMP and other baseline methods under the “novel pair setting”, respectively. The error bars under “novel protein setting” and “novel peptide setting” represent the mean ± standard deviation over five folds (n = 5). The error bars under “novel pair setting” represent the mean ± standard deviation over nine folds (n = 9). “NA” stands for random cross-validation, i.e., randomly splitting the data set and used 80% of the data set to train the model and the remaining 20% to evaluate the performance.
Fig. 3
Fig. 3. Performance evaluation of CAMP on peptide-binding residue identification on the benchmark data set through fivefold cross-validation.
a, b show the distributions of AUC and MCC for peptide-binding residue prediction, respectively. The mean values of average AUC and MCC are plotted in dotted lines. cf show four examples of peptide-binding residue identifications by CAMP that ranked ~1%, 35%, 50%, and 85% in terms of average AUC, respectively. The PDB complexes were retrieved from the RCSB PDB,, and the images were generated by PyMOL. The protein chains in the complexes are colored in light blue while the peptide chains are colored in light purple and pink. For each peptide, the true binding residues are colored in pink while the predicted binding residues generated by CAMP are colored in wheat.
Fig. 4
Fig. 4. CAMP yielded robust performance and outperformed the baseline models on an independent test set.
a, b show the evaluation results with different positive-negative ratios of the test data set in terms of AUC and AUPR, respectively. c, d show the distributions of AUC and MCC for peptide-binding residue prediction, respectively. The mean values of average AUC and MCC are plotted with dotted lines.
Fig. 5
Fig. 5. Model performance of CAMP, HSM-ID, and HSM-D across eight families.
CAMP achieved a relatively stable performance overall families, whereas the performances of HSM models were easily influenced by the sample size (marked in gray number) of the training set. CAMP outperformed the HSM models, with an increase of AUC by 3–7%. All the evaluation metrics of the HSM models were obtained from the origin paper.

References

    1. Lee, A.C.-L., Harris, J.L., Khanna, K.K. & Hong, J.-H. A comprehensive review on current advances in peptide drug development and design. Int. J. Mol. Sci.20, 2383 (2019). - PMC - PubMed
    1. Fosgerau K, Hoffmann T. Peptide therapeutics: current status and future directions. Drug Discov. Today. 2015;20:122–128. doi: 10.1016/j.drudis.2014.10.003. - DOI - PubMed
    1. Zhao Z, Peng Z, Yang J. Improving sequence-based prediction of protein-peptide binding residues by introducing intrinsic disorder and a consensus method. J. Chem. Inf. Model. 2018;58:1459–1468. doi: 10.1021/acs.jcim.8b00019. - DOI - PubMed
    1. Cichonska A, et al. Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors. PLoS Comput. Biol. 2017;13:e1005678. doi: 10.1371/journal.pcbi.1005678. - DOI - PMC - PubMed
    1. Liu Y, Wu M, Miao C, Zhao P, Li X-L. Neighborhood regularized logistic matrix factorization for drug-target interaction prediction. PLoS Comput. Biol. 2016;12:e1004760. doi: 10.1371/journal.pcbi.1004760. - DOI - PMC - PubMed

Publication types