. 2017 Feb 6;12(2):e0169356.

doi: 10.1371/journal.pone.0169356. eCollection 2017.

Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs

Csilla Várnai¹, Nikolas S Burkoff¹, David L Wild¹

Affiliations

PMID: 28166227
PMCID: PMC5293240
DOI: 10.1371/journal.pone.0169356

Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs

Csilla Várnai et al. PLoS One. 2017.

. 2017 Feb 6;12(2):e0169356.

doi: 10.1371/journal.pone.0169356. eCollection 2017.

Authors

Csilla Várnai¹, Nikolas S Burkoff¹, David L Wild¹

Affiliation

¹ Systems Biology Centre, University of Warwick, Coventry, CV4 7AL, United Kingdom.

PMID: 28166227
PMCID: PMC5293240
DOI: 10.1371/journal.pone.0169356

Abstract

Evolutionary information stored in multiple sequence alignments (MSAs) has been used to identify the interaction interface of protein complexes, by measuring either co-conservation or co-mutation of amino acid residues across the interface. Recently, maximum entropy related correlated mutation measures (CMMs) such as direct information, decoupling direct from indirect interactions, have been developed to identify residue pairs interacting across the protein complex interface. These studies have focussed on carefully selected protein complexes with large, good-quality MSAs. In this work, we study protein complexes with a more typical MSA consisting of fewer than 400 sequences, using a set of 79 intramolecular protein complexes. Using a maximum entropy based CMM at the residue level, we develop an interface level CMM score to be used in re-ranking docking decoys. We demonstrate that our interface level CMM score compares favourably to the complementarity trace score, an evolutionary information-based score measuring co-conservation, when combined with the number of interface residues, a knowledge-based potential and the variability score of individual amino acid sites. We also demonstrate, that, since co-mutation and co-complementarity in the MSA contain orthogonal information, the best prediction performance using evolutionary information can be achieved by combining the co-mutation information of the CMM with co-conservation information of a complementarity trace score, predicting a near-native structure as the top prediction for 41% of the dataset. The method presented is not restricted to small MSAs, and will likely improve interface prediction also for complexes with large and good-quality MSAs.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. MSAs in the dataset.**
The cumulative distribution function of protein complexes in the dataset as a function of the number of sequences in their MSA. 95% of protein complexes have fewer than 400 sequences. Right: The effective number of sequences as a function of the number of amino acids in the protein complexes studied.

**Fig 2. Schematic representation of the protein complex.**
Solid lines border both proteins, while the surface regions are coloured grey. Interacting amino acids are connected with lines, forming a network. S_a denotes the set of surface residues of the protein that a is a residue of, ${\bar{S}}_{i}$ denotes the set of surface residues of the protein that i is not a residue of, and I denotes the set of interactions across the interface.

**Fig 3. Residue-level CMM scores are noisy interface predictors.**
Left: The proportion of true contacts in the top N/10 Z(i, j) scores. Right: The top 10 contacts predicted by the residue level CMM scores for D1FJGE1_D1FJGE2, the protein complex with the highest contact ratio (0.21). True contacts are coloured yellow, false contacts are coloured red.

**Fig 4. Probability distribution of the residue-level CMM scores.**
The distribution of the standardised Z(i, j) scores for all residues (solid line) and for the interface residues of the native structure (dashed line). Left: Probability distribution function, Right: cumulative distribution function. Dash-dotted line shows 0, the mean of the standardised scores.

**Fig 5. Comparison of the interface-level scoring functions using CMM.**
The fraction of proteins for which there is at least one near-native complex in the top predictions, for the scoring functions S^CMM (black dash-dotted line), $S_{raw}^{CMM}$ (light grey dash-dotted line), S(S^RP, S^N, S^ent) (grey solid line), S(S^RP, S^N, S^ent, S^CMM) (black solid line) and $S (S^{RP}, S^{N}, S^{ent}, S_{raw}^{CMM})$ (light grey solid line).

**Fig 6. Comparison of co-conservation and co-evolution scores.**
Left: Scatter plot showing the rank of the best near-native prediction for the CMM (horizontal axis) against the CT (vertical axis) score (R_Pearson = −0.02), coloured by the rank of the best near-native prediction of the entropy score. Right: Scatter plot showing the rank of the best near-native prediction for the entropy (horizontal axis) against the CT (vertical axis) score (R_Pearson = 0.60), coloured by the rank of the best near-native prediction of the CMM score.

**Fig 7. The effect of co-conservation and co-evolution on the interface prediction.**
Left: The fraction of proteins for which a near-native decoy is in the top scored predictions, as a function of the number of decoys considered, for the S(S^RP, S^N, S^ent) (grey solid line), S(S^RP, S^N, S^ent, S^CMM (black solid line), S(S^RP, S^N, S^ent, S^CT (grey dashed line), S(S^RP, S^N, S^ent, S^CT, S^CMM) (black dashed line) and $S_{raw}^{CMM}$ (light grey dash-dotted line) scoring functions. Right: The number of proteins for which the rank of the top near-native prediction is within the top 1, 5 or 10 predictions, for the S(S^RP, S^N, S^ent) (solid black bars), S(S^RP, S^N, S^ent, S^CMM (solid grey bars), S(S^RP, S^N, S^ent, S^CT (dark checked bars) and S(S^RP, S^N, S^ent, S^CT, S^CMM) (light checked bars) scoring functions.

See this image and copyright information in PMC

Cited by

Combining cysteine scanning with chemical labeling to map protein-protein interactions and infer bound structure in an intrinsically disordered region.
Ahmed S, Chattopadhyay G, Manjunath K, Bhasin M, Singh N, Rasool M, Das S, Rana V, Khan N, Mitra D, Asok A, Singh R, Varadarajan R. Ahmed S, et al. Front Mol Biosci. 2022 Oct 7;9:997653. doi: 10.3389/fmolb.2022.997653. eCollection 2022. Front Mol Biosci. 2022. PMID: 36275627 Free PMC article.
Coevolutive, evolutive and stochastic information in protein-protein interactions.
Andrade M, Pontes C, Treptow W. Andrade M, et al. Comput Struct Biotechnol J. 2019 Nov 20;17:1429-1435. doi: 10.1016/j.csbj.2019.10.005. eCollection 2019. Comput Struct Biotechnol J. 2019. PMID: 31871588 Free PMC article.
An Ensemble Classifier to Predict Protein-Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model.
Li Y, Li LP, Wang L, Yu CQ, Wang Z, You ZH. Li Y, et al. Int J Mol Sci. 2019 Jul 17;20(14):3511. doi: 10.3390/ijms20143511. Int J Mol Sci. 2019. PMID: 31319578 Free PMC article.

References

1. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986;5:803–806. - PMC - PubMed
1. Taylor WR, Hamilton RS, Sadowski MI. Prediction of contacts from correlated sequence substitutions. Current opinion in structural biology. 2013;23(3):473–479. 10.1016/j.sbi.2013.04.001 - DOI - PubMed
1. Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nature biotechnology. 2012;30(11):1072–1080. 10.1038/nbt.2419 - DOI - PMC - PubMed
1. Dwyer RS, Ricci DP, Colwell LJ, Silhavy TJ, Wingreen NS. Predicting Functionally Informative Mutations in Escherichia coli BamA Using Evolutionary Covariance Analysis. Genetics. 2013;195(2):443–455. 10.1534/genetics.113.155861 - DOI - PMC - PubMed
1. Colwell LJ, Brenner MP, Murray AW. Conservation Weighting Functions Enable Covariance Analyses to Detect Functionally Important Amino Acids. PLoS One. 2014;9(11):e107723 10.1371/journal.pone.0107723 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs

Affiliation

Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources