Integrative Modeling of Protein-Polypeptide Complexes by Bayesian Model Selection using AlphaFold and NMR Chemical Shift Perturbation Data

doi:10.1101/2024.09.19.613999

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Sep 22:2024.09.19.613999.

doi: 10.1101/2024.09.19.613999.

Integrative Modeling of Protein-Polypeptide Complexes by Bayesian Model Selection using AlphaFold and NMR Chemical Shift Perturbation Data

Tiburon L Benavides¹, Gaetano T Montelione²

Affiliations

¹ Department of Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
² Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.

PMID: 39345459
PMCID: PMC11430059
DOI: 10.1101/2024.09.19.613999

Integrative Modeling of Protein-Polypeptide Complexes by Bayesian Model Selection using AlphaFold and NMR Chemical Shift Perturbation Data

Tiburon L Benavides et al. bioRxiv. 2024.

[Preprint]. 2024 Sep 22:2024.09.19.613999.

doi: 10.1101/2024.09.19.613999.

Authors

Tiburon L Benavides¹, Gaetano T Montelione²

Affiliations

¹ Department of Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
² Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.

PMID: 39345459
PMCID: PMC11430059
DOI: 10.1101/2024.09.19.613999

Abstract

Protein-polypeptide interactions, including those involving intrinsically-disordered peptides and intrinsically-disordered regions of protein binding partners, are crucial for many biological functions. However, experimental structure determination of protein-peptide complexes can be challenging. Computational methods, while promising, generally require experimental data for validation and refinement. Here we present CSP_Rank, an integrated modeling approach to determine the structures of protein-peptide complexes. This method combines AlphaFold2 (AF2) enhanced sampling methods with a Bayesian conformational selection process based on experimental Nuclear Magnetic Resonance (NMR) Chemical Shift Perturbation (CSP) data and AF2 confidence metrics. Using a curated dataset of 108 protein-peptide complexes from the Biological Magnetic Resonance Data Bank (BMRB), we observe that while AF2 typically yields models with excellent consistency with experimental CSP data, applying enhanced sampling followed by data-guided conformational selection routinely results in ensembles of structures with improved agreement with NMR observables. For two systems, we cross-validate the CSP-selected models using independently acquired nuclear Overhauser effect (NOE) NMR data and demonstrate how CSP and NMR can be combined using our Bayesian framework for model selection. CSP_Rank is a novel method for integrative modeling of protein-peptide complexes and has broad implications for studies of protein-peptide interactions and aiding in understanding their biological functions.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests. GTM is a founder of Nexomics Biosciences, Inc. This does not represent a conflict of interest in this study.

Figures

**Fig 1.. Data flowchart for generating CSP_Rank_Score.**
Structure models are selected from the PDB archive (medoid model of NMR ensemble) or generated by AF2 and provided to a chemical shift predictor (e.g. UCBshift) to generate a list of residues with predicted significant CSPs (bottom two rows). These predicted significant CSPs are compared to the list of protein residues with experimentally observed (real) significant CSPs shifts (top two rows) to generate a confusion matrix and resulting statistical scores used to compute the CSP_Rank_Score (see Eqns. 2 – 6).

**Fig. 2.. Performance summary of baseline AF2 on protein-peptide dataset.**
(A) Histogram of TM scores of baseline AF2 models to the medoid model selected from the NMR ensemble deposited in the PDB. (B) Scatter plot of CSP_Rank_Scores for 108 complex models, with y=x line plotted in red. Points above the y=x line denotes systems where the baseline AF2 model fits the CSP data better than the medoid PDB model. (C) Histogram of residuals from the y=x line; when the residual is > 0 then the baseline AF2 model fits the CSP data better than the medoid PDB model. Paired sample t-test p-value = 0.033. During this work, AlphaFold 3 (Abramson et al, 2024) was released. We evaluated performance across our dataset using the AlphaFold 3 server as a structure prediction engine in Supplementary Figure 8.

**Fig. 3.. Comparison of predicted and observed CSPs for representative complexes modeled with AF2.**
In each panel a CSP histogram is provided for each residue in the sequence, with bars colored varying shades of red to reflect the size of the indicated CSP; red bars indicating large “significant” CSPs, and grey bars indicating small “insignificant” CSPs. Colored “blips” at the bottom of the histogram denote the pattern of agreement between the observed CSPs and those predicted using the PDB model or the AF2 model: purple - AF2 model fits the CSP data but the PDB model does not, green - AF2 and PDB models both fit CSP about equally well, yellow - PDB model fits the CSP data but the AF2 model does not, and red - neither AF2 nor PDB model fit the CSP data. Below the histogram is a structural view of the medoid PDB model (left) and rank 1 AF2 model (right) with the backbone cartoon of the protein colored by the significance pattern of CSPs. **(A)** Data for PDB_ID 2kpz, a case where the AF2 model has a much better CSP_Rank_Score. **(B)** Data for PDB_ID = 2kfh, another case where AF2 achieves a better CSP_Rank_Score by docking closer to residues with significant CSPs. **(C)** Data for PDB_ID=2n7k, a case where AF2 misplaces the docking of the peptide ligand, which results in a better CSP_Rank_Scorefor the PDB model. **(D)** Data for PDB_ID 7jq8, a case with similar CSP_Rank_Scores for the two models, and many allosteric CSPs throughout the receptor due to peptide binding. In all cases, the top-ranked baseline AF2 model and medoid conformer from the PDB NMR ensemble are used to predict CSPs, which are then compared to the experimental CSP data to calculate a CSP_Rank_Score.

**Fig. 4:. Enhanced sampling for IN TP:ET complex.**
(**A).** Plots of first two TSNE PCA dimensions demonstrating conformational space explored by ES methods AFSample and AFAlt. Different ES sampling protocols provide different models of the complex. (B) Clusters extracted from K-means hierarchical clustering. **(C)** Heatmap of P(model|data) overlaid on TSNE analysis. (D) Boxplots of CSP_Rank_Scores from each TSNE cluster. **(E)** Ensemble of the models which have the highest CSP_Rank_Score from each TSNE or UMAP cluster, colored by chain, where the receptor protein is green, and the ligand peptide is blue. **(F)** Ensemble depicted in 4E; the color encodes the residue-specific pLDDT score averaged across the AF2 models of the ensemble (red - high; blue - low).

**Fig. 5.. Improvement in CSP_Rank_Scores with Enhanced Sampling.**
Following the convention from Figure 3, in each panel a CSP bar plot is provided for each residue in the sequence, with bars colored varying shades of red to reflect the size of the indicated CSP. Colored blips at the bottom of the histogram denote the pattern of agreement between the observed CSPs and those predicted using the medoid PDB model or the medoid model from the enhanced sampling ensemble as defined in Figure 4 legend. Below the bar plot is a structural view of the top-ranked baseline AF2 model (left) and the medoid AF-NMR model (right) with the backbone cartoon of the protein colored by the significance pattern of CSPs. In each of these cases, the ES protocol results in small adjustments to the orientation of the binding peptide ligand which enables a better fit to the NMR CSP data. Across 17 systems tested, the average improvement in CSP_Rank_Score between baseline AF2 model and the medoid AF-NMR model is 0.08. For each of these 17 systems, an ensemble was generated for which the medoid model has a better CSP_Rank_Score than the baseline AF2 model (Supplementary Figures. 11–19).

See this image and copyright information in PMC

References

1. Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A.J., Bambrick J. and Bodenstein S.W., 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pp.1–3. 10.1038/s41586-024-07487-w - DOI - PMC - PubMed
1. Aiyer S., Swapna G.V., Ma L.C., Liu G., Hao J., Chalmers G., Jacobs B.C., Montelione G.T. and Roth M.J., 2021. A common binding motif in the ET domain of BRD3 forms polymorphic structural interfaces with host and viral proteins. Structure, 29(8), pp.886–898. 10.1016/j.str.2021.01.010 - DOI - PMC - PubMed
1. Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G.R., Wang J., Cong Q., Kinch L.N., Schaeffer R.D. and Millán C., 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), pp.871–876. 10.1126/science.abj8754 - DOI - PMC - PubMed
1. Basu S. and Wallner B., 2016. DockQ: a quality measure for protein-protein docking models. PLOS One, 11(8), p.e0161879. 10.1371/journal.pone.0161879 - DOI - PMC - PubMed
1. Bhattacharya A., Tejero R. and Montelione G.T., 2007. Evaluating protein structures determined by structural genomics consortia. PROTEINS: Struct Funct Bioinformatics, 66(4), pp.778–795. 10.1002/prot.21165 - DOI - PubMed

Publication types

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Cold Spring Harbor Laboratory
- PubMed Central

[1] Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A.J., Bambrick J. and Bodenstein S.W., 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pp.1–3. 10.1038/s41586-024-07487-w - DOI - PMC - PubMed

[2] Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A.J., Bambrick J. and Bodenstein S.W., 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pp.1–3. 10.1038/s41586-024-07487-w - DOI - PMC - PubMed

[3] Aiyer S., Swapna G.V., Ma L.C., Liu G., Hao J., Chalmers G., Jacobs B.C., Montelione G.T. and Roth M.J., 2021. A common binding motif in the ET domain of BRD3 forms polymorphic structural interfaces with host and viral proteins. Structure, 29(8), pp.886–898. 10.1016/j.str.2021.01.010 - DOI - PMC - PubMed

[4] Aiyer S., Swapna G.V., Ma L.C., Liu G., Hao J., Chalmers G., Jacobs B.C., Montelione G.T. and Roth M.J., 2021. A common binding motif in the ET domain of BRD3 forms polymorphic structural interfaces with host and viral proteins. Structure, 29(8), pp.886–898. 10.1016/j.str.2021.01.010 - DOI - PMC - PubMed

[5] Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G.R., Wang J., Cong Q., Kinch L.N., Schaeffer R.D. and Millán C., 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), pp.871–876. 10.1126/science.abj8754 - DOI - PMC - PubMed

[6] Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G.R., Wang J., Cong Q., Kinch L.N., Schaeffer R.D. and Millán C., 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), pp.871–876. 10.1126/science.abj8754 - DOI - PMC - PubMed

[7] Basu S. and Wallner B., 2016. DockQ: a quality measure for protein-protein docking models. PLOS One, 11(8), p.e0161879. 10.1371/journal.pone.0161879 - DOI - PMC - PubMed

[8] Basu S. and Wallner B., 2016. DockQ: a quality measure for protein-protein docking models. PLOS One, 11(8), p.e0161879. 10.1371/journal.pone.0161879 - DOI - PMC - PubMed

[9] Bhattacharya A., Tejero R. and Montelione G.T., 2007. Evaluating protein structures determined by structural genomics consortia. PROTEINS: Struct Funct Bioinformatics, 66(4), pp.778–795. 10.1002/prot.21165 - DOI - PubMed

[10] Bhattacharya A., Tejero R. and Montelione G.T., 2007. Evaluating protein structures determined by structural genomics consortia. PROTEINS: Struct Funct Bioinformatics, 66(4), pp.778–795. 10.1002/prot.21165 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Integrative Modeling of Protein-Polypeptide Complexes by Bayesian Model Selection using AlphaFold and NMR Chemical Shift Perturbation Data

Affiliations

Integrative Modeling of Protein-Polypeptide Complexes by Bayesian Model Selection using AlphaFold and NMR Chemical Shift Perturbation Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

This is a preprint.

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Related information

Grants and funding

LinkOut - more resources

Full Text Sources