Enhancing coevolution-based contact prediction by imposing structural self-consistency of the contacts

Maher M Kassem¹, Lars B Christoffersen¹, Andrea Cavalli², Kresten Lindorff-Larsen³

Affiliations

¹ Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK, 2200, Denmark.
² Institute for Research in Biomedicine, Università della Svizzera italiana (USI), Via Vincenzo Vela 6, 6500, Bellinzona, Switzerland. andrea.cavalli@irb.usi.ch.
³ Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK, 2200, Denmark. lindorff@bio.ku.dk.

PMID: 30042380
PMCID: PMC6057941
DOI: 10.1038/s41598-018-29357-y

Enhancing coevolution-based contact prediction by imposing structural self-consistency of the contacts

Maher M Kassem et al. Sci Rep. 2018.

. 2018 Jul 24;8(1):11112.

doi: 10.1038/s41598-018-29357-y.

Authors

Maher M Kassem¹, Lars B Christoffersen¹, Andrea Cavalli², Kresten Lindorff-Larsen³

Affiliations

¹ Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK, 2200, Denmark.
² Institute for Research in Biomedicine, Università della Svizzera italiana (USI), Via Vincenzo Vela 6, 6500, Bellinzona, Switzerland. andrea.cavalli@irb.usi.ch.
³ Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen, DK, 2200, Denmark. lindorff@bio.ku.dk.

PMID: 30042380
PMCID: PMC6057941
DOI: 10.1038/s41598-018-29357-y

Abstract

Based on the development of new algorithms and growth of sequence databases, it has recently become possible to build robust higher-order sequence models based on sets of aligned protein sequences. Such models have proven useful in de novo structure prediction, where the sequence models are used to find pairs of residues that co-vary during evolution, and hence are likely to be in spatial proximity in the native protein. The accuracy of these algorithms, however, drop dramatically when the number of sequences in the alignment is small. We have developed a method that we termed CE-YAPP (CoEvolution-YAPP), that is based on YAPP (Yet Another Peak Processor), which has been shown to solve a similar problem in NMR spectroscopy. By simultaneously performing structure prediction and contact assignment, CE-YAPP uses structural self-consistency as a filter to remove false positive contacts. Furthermore, CE-YAPP solves another problem, namely how many contacts to choose from the ordered list of covarying amino acid pairs. We show that CE-YAPP consistently improves contact prediction from multiple sequence alignments, in particular for proteins that are difficult targets. We further show that the structures determined from CE-YAPP are also in better agreement with those determined using traditional methods in structural biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Workflow diagram of the CE-YAPP method. Predicted coevolution contacts and predicted secondary structure are used in combination to filter out false positive contacts. The red ‘x’ represents a false positive contact.

**Figure 2**
CE-YAPP Protocol and results for the ribosome hibernation promoting factor HPF. (a) CE-YAPP uses as input 114 coevolution based long-range contacts predicted using Gremlin. These contacts are then used as input to the protocol depicted in Fig. 1, and repeated 64 times producing 64 similar contact lists. The final list of predicted consensus contacts are those that are turned on in more than 30% of the simulations. (b) The precision of the consensus contacts produced by CE-YAPP is compared to the precision of the input set of contacts.

**Figure 3**
CE-YAPP performance on the NOUMENON dataset. (a) The number of effective sequences divided by the number of amino acids, N_Eff/N_AA, is plotted for each protein and sorted from low to high. The data in the remaining panels are sorted accordingly. The grey vertical bars represent the proteins with N_Eff/N_AA closest to 1 and 5, respectively. (b) Number of contacts. (c) Recall (TP/(TP + FN)) of the CE-YAPP contacts. (d) Precision (TP/(TP + FP)). (e) Precision of CE-YAPP contacts minus precision of the input contacts (Δ*Precision*). The black dashed lines in panels (e and f) denote zero. (f) Restraint violation energy (Eq. 4) (g) Drop in restraint violation after CE-YAPP (Δ*Energy*). (h) Accuracy of the predicted secondary structures using the NOUMENON multiple sequence alignments.

**Figure 4**
Structural Performance on the NOUMENON dataset. Panel (a) The number of effective sequences divided by the number of amino acids, N_Eff/N_AA, is plotted for each protein and sorted from low to high. The data in the remaining panels are sorted accordingly. The grey vertical bars represent the proteins with N_Eff/N_AA closest to 1 and 5, respectively. Panel (b) Difference in GDT(5) (Δ*GDT*(5)). Panel (c) Difference in GDT-TS (Δ GDT-TS). The black dashed line denotes zero.

See this image and copyright information in PMC

References

1. Marks DS, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6:e28766. doi: 10.1371/journal.pone.0028766. - DOI - PMC - PubMed
1. Hopf TA, et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell. 2012;149:1607–1621. doi: 10.1016/j.cell.2012.04.012. - DOI - PMC - PubMed
1. Nugent T, Jones DT. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc. Natl. Acad. Sci. USA. 2012;109:E1540–E1547. doi: 10.1073/pnas.1120036109. - DOI - PMC - PubMed
1. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in proteinprotein interaction by message passing. Proc. Natl. Acad. Sci. USA. 2009;106:67–72. doi: 10.1073/pnas.0805923106. - DOI - PMC - PubMed
1. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife. 2014;3:e02030. doi: 10.7554/eLife.02030. - DOI - PMC - PubMed

Grants and funding

R126-2012- 12589/Lundbeckfonden (Lundbeck Foundation)/International

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing coevolution-based contact prediction by imposing structural self-consistency of the contacts

Affiliations

Enhancing coevolution-based contact prediction by imposing structural self-consistency of the contacts

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources