Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 6;12(2):e0169356.
doi: 10.1371/journal.pone.0169356. eCollection 2017.

Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs

Affiliations

Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs

Csilla Várnai et al. PLoS One. .

Abstract

Evolutionary information stored in multiple sequence alignments (MSAs) has been used to identify the interaction interface of protein complexes, by measuring either co-conservation or co-mutation of amino acid residues across the interface. Recently, maximum entropy related correlated mutation measures (CMMs) such as direct information, decoupling direct from indirect interactions, have been developed to identify residue pairs interacting across the protein complex interface. These studies have focussed on carefully selected protein complexes with large, good-quality MSAs. In this work, we study protein complexes with a more typical MSA consisting of fewer than 400 sequences, using a set of 79 intramolecular protein complexes. Using a maximum entropy based CMM at the residue level, we develop an interface level CMM score to be used in re-ranking docking decoys. We demonstrate that our interface level CMM score compares favourably to the complementarity trace score, an evolutionary information-based score measuring co-conservation, when combined with the number of interface residues, a knowledge-based potential and the variability score of individual amino acid sites. We also demonstrate, that, since co-mutation and co-complementarity in the MSA contain orthogonal information, the best prediction performance using evolutionary information can be achieved by combining the co-mutation information of the CMM with co-conservation information of a complementarity trace score, predicting a near-native structure as the top prediction for 41% of the dataset. The method presented is not restricted to small MSAs, and will likely improve interface prediction also for complexes with large and good-quality MSAs.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. MSAs in the dataset.
The cumulative distribution function of protein complexes in the dataset as a function of the number of sequences in their MSA. 95% of protein complexes have fewer than 400 sequences. Right: The effective number of sequences as a function of the number of amino acids in the protein complexes studied.
Fig 2
Fig 2. Schematic representation of the protein complex.
Solid lines border both proteins, while the surface regions are coloured grey. Interacting amino acids are connected with lines, forming a network. Sa denotes the set of surface residues of the protein that a is a residue of, S¯i denotes the set of surface residues of the protein that i is not a residue of, and I denotes the set of interactions across the interface.
Fig 3
Fig 3. Residue-level CMM scores are noisy interface predictors.
Left: The proportion of true contacts in the top N/10 Z(i, j) scores. Right: The top 10 contacts predicted by the residue level CMM scores for D1FJGE1_D1FJGE2, the protein complex with the highest contact ratio (0.21). True contacts are coloured yellow, false contacts are coloured red.
Fig 4
Fig 4. Probability distribution of the residue-level CMM scores.
The distribution of the standardised Z(i, j) scores for all residues (solid line) and for the interface residues of the native structure (dashed line). Left: Probability distribution function, Right: cumulative distribution function. Dash-dotted line shows 0, the mean of the standardised scores.
Fig 5
Fig 5. Comparison of the interface-level scoring functions using CMM.
The fraction of proteins for which there is at least one near-native complex in the top predictions, for the scoring functions SCMM (black dash-dotted line), SrawCMM (light grey dash-dotted line), S(SRP, SN, Sent) (grey solid line), S(SRP, SN, Sent, SCMM) (black solid line) and S(SRP,SN,Sent,SrawCMM) (light grey solid line).
Fig 6
Fig 6. Comparison of co-conservation and co-evolution scores.
Left: Scatter plot showing the rank of the best near-native prediction for the CMM (horizontal axis) against the CT (vertical axis) score (RPearson = −0.02), coloured by the rank of the best near-native prediction of the entropy score. Right: Scatter plot showing the rank of the best near-native prediction for the entropy (horizontal axis) against the CT (vertical axis) score (RPearson = 0.60), coloured by the rank of the best near-native prediction of the CMM score.
Fig 7
Fig 7. The effect of co-conservation and co-evolution on the interface prediction.
Left: The fraction of proteins for which a near-native decoy is in the top scored predictions, as a function of the number of decoys considered, for the S(SRP, SN, Sent) (grey solid line), S(SRP, SN, Sent, SCMM (black solid line), S(SRP, SN, Sent, SCT (grey dashed line), S(SRP, SN, Sent, SCT, SCMM) (black dashed line) and SrawCMM (light grey dash-dotted line) scoring functions. Right: The number of proteins for which the rank of the top near-native prediction is within the top 1, 5 or 10 predictions, for the S(SRP, SN, Sent) (solid black bars), S(SRP, SN, Sent, SCMM (solid grey bars), S(SRP, SN, Sent, SCT (dark checked bars) and S(SRP, SN, Sent, SCT, SCMM) (light checked bars) scoring functions.

Similar articles

Cited by

References

    1. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986;5:803–806. - PMC - PubMed
    1. Taylor WR, Hamilton RS, Sadowski MI. Prediction of contacts from correlated sequence substitutions. Current opinion in structural biology. 2013;23(3):473–479. 10.1016/j.sbi.2013.04.001 - DOI - PubMed
    1. Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nature biotechnology. 2012;30(11):1072–1080. 10.1038/nbt.2419 - DOI - PMC - PubMed
    1. Dwyer RS, Ricci DP, Colwell LJ, Silhavy TJ, Wingreen NS. Predicting Functionally Informative Mutations in Escherichia coli BamA Using Evolutionary Covariance Analysis. Genetics. 2013;195(2):443–455. 10.1534/genetics.113.155861 - DOI - PMC - PubMed
    1. Colwell LJ, Brenner MP, Murray AW. Conservation Weighting Functions Enable Covariance Analyses to Detect Functionally Important Amino Acids. PLoS One. 2014;9(11):e107723 10.1371/journal.pone.0107723 - DOI - PMC - PubMed

LinkOut - more resources