. 2018 Jan 24;6(1):65-74.e3.

doi: 10.1016/j.cels.2017.11.014. Epub 2017 Dec 20.

Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

Yang Liu¹, Perry Palmedo², Qing Ye¹, Bonnie Berger³, Jian Peng⁴

Affiliations

¹ Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
² Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA; Division of Medical Sciences, Harvard University, Cambridge, MA 02138, USA.
³ Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA. Electronic address: bab@mit.edu.
⁴ Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA. Electronic address: jianpeng@illinois.edu.

PMID: 29275173
PMCID: PMC5808454
DOI: 10.1016/j.cels.2017.11.014

Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

Yang Liu et al. Cell Syst. 2018.

. 2018 Jan 24;6(1):65-74.e3.

doi: 10.1016/j.cels.2017.11.014. Epub 2017 Dec 20.

Authors

Yang Liu¹, Perry Palmedo², Qing Ye¹, Bonnie Berger³, Jian Peng⁴

Affiliations

¹ Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
² Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA; Division of Medical Sciences, Harvard University, Cambridge, MA 02138, USA.
³ Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA. Electronic address: bab@mit.edu.
⁴ Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA. Electronic address: jianpeng@illinois.edu.

PMID: 29275173
PMCID: PMC5808454
DOI: 10.1016/j.cels.2017.11.014

Abstract

While genes are defined by sequence, in biological systems a protein's function is largely determined by its three-dimensional structure. Evolutionary information embedded within multiple sequence alignments provides a rich source of data for inferring structural constraints on macromolecules. Still, many proteins of interest lack sufficient numbers of related sequences, leading to noisy, error-prone residue-residue contact predictions. Here we introduce DeepContact, a convolutional neural network (CNN)-based approach that discovers co-evolutionary motifs and leverages these patterns to enable accurate inference of contact probabilities, particularly when few related sequences are available. DeepContact significantly improves performance over previous methods, including in the CASP12 blind contact prediction task where we achieved top performance with another CNN-based approach. Moreover, our tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent metric to assess contact prediction across diverse proteins. Through substantially improving the precision-recall behavior of contact prediction, DeepContact suggests we are near a paradigm shift in template-free modeling for protein structure prediction.

Keywords: co-evolution; contact prediction; convolutional neural networks; deep learning; evolutionary couplings; protein structure prediction; structure prediction.

PubMed Disclaimer

Figures

**Figure 1. Overview of DeepContact**
(A) Structure of the full-feature DeepContact model. DeepContact takes in global, 1D, and 2D features calculated from the amino acid sequence, including evolutionary couplings, and uses a CNN to predict contacts. (B) DeepContact trains using a set of solved structures, taking in the distance matrix (left) and, as a preprocessing step, producing a contact matrix using an 8 Å threshold (right). Intrinsically, these contact maps have patterns, and clearly some matrices cannot be contact maps. By learning the structure of contact matrices and the relationship between couplings and contacts, DeepContact is able to vastly improve evolutionary- based contact prediction.

**Figure 2. Improved Performance of DeepContact on Benchmark Datasets**
(A) DeepContact outperforms CCMPred on the ASTRAL validation set using only CCMPred as features. Including other features further improves the precision-recall performance. (B) Precision-recall performance of contact-prediction methods on the CAMEO dataset. DeepContact further outperforms metaPSICOV on the CAMEO dataset. (C) Precision-recall performance of contact-prediction methods on the CASP228 dataset. On all three validation sets, using our novel probability cutoff enables enhancement of the precision/recall characteristics of DeepContact. Effectively, we exclude sequences or contacts with little confidence, and include contacts in which we have more confidence, leading to improved performance. (D) The improved contact-prediction performance of DeepContact over CCMPred leads to improved contact-assisted folding across the CASP12 free-modeling target set. Targets where folding failed using CCMPred contacts are plotted at a TM-score of 0.

**Figure 3. DeepContact Integrates Local Information to Improve Contact Prediction**
(A) Example of how DeepContact (upper right triangle) improves contact prediction over CCMPred (lower left triangle) input for PDB: 3LRT_A (Cherney et al., 2011). Most of the DeepContact “false-positives” border regions of true positives, and the two that do not (black circles) are true homodimer contacts between chain A (green) and chain B (cyan) separated by 3.6 and 4.3 Å, respectively (right panel). See also Figure S1. (B) DeepContact integrates local information. Using CCMPred as the only input feature (top left), DeepContact is able to identify patterns indicative of secondary structure elements (bottom left). Using additional features sharpens the predicted contact map (top right). These matrices resemble the experimentally determined distance matrix (bottom right).

**Figure 4. DeepContact Reranks Full Contact Distribution**
(A) Contacts were ranked across the entire ASTRAL validation set based on distance (black), DeepContact probability (blue), and CCMPred score (red). To make CCMPred comparable across examples we normalized to the SD of the medium- and long-range scores within each protein. The x axis is the rank-ordered list of DeepContact probabilities, and the y axis the average distance of the higher-ranked contacts. DeepContact (blue) significantly improves the rank order of contacts across the distribution compared with CCMPred, being much closer to the true rank order of contacts (black). (B) The average distance of the false positives for each of the ASTRAL validation set structures as called by CCMPred versus as called by DeepContact. The “false positives” called by DeepContact are significantly closer in the experimental structure, with many of them lying just beyond the 8 Å cutoff.

**Figure 5. DeepContact Converts Evolutionary Coupling Scores to Coupling Probabilities**
Boxplot of precision of DeepContact with respect to the ASTRAL validation set (y axis) with DeepContact predictions binned by 0.01 probability. Mean (red) and median (blue) precision are shown for each bin; whiskers represent 5th to 95th percentiles. We trained DeepContact using a cross-entropy loss function, which effectively maximizes the ability to distinguish residue pairs less than 8 Å apart from residues more than 8 Å apart. While the probabilities are better calibrated at the ends of the distribution, those in the middle enable an objective understanding of the likely probability of contacts using the output probabilities.

**Figure 6. Visualization of First-Layer Filters**
Visualization of features picked up by the first layer of a DeepContact model trained with CCMPred as the only feature (STAR Methods). We averaged the top 100 activations of each filter across the ASTRAL validation set and used t-stochastic neighbor embedding to reduce the dimensionality (center gray-shaded matrix). Insets (A–F) show the activation patterns of selected filters, as well as the top 5 structural alignments and sequence similarity of the proteins with the top 100 activations. Filters (center) cluster by secondary structure element, spanning from β segments (red, top) to helical/β to helical segments (blue, bottom). The β patterns fit with the alternating contacts of β sheets (A, B, and F) and distinguish between parallel (A and B) and antiparallel (F) sheets. Helical filter (E) shows a grid-pattern separated by three to four residues, matching the rise of a helix.

See this image and copyright information in PMC

References

1. Adhikari B, Bhattacharya D, Cao R, Cheng J. CONFOLD: residue-residue contact-guided ab initio protein folding. Proteins. 2015;83:1436–1449. - PMC - PubMed
1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. - PMC - PubMed
1. Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol. Syst. Biol. 2016;12:878. - PMC - PubMed
1. Berger B, Wilson DB, Wolf E, Tonchev T, Milla M, Kim PS. Predicting coiled coils by use of pairwise residue correlations. Proc. Natl. Acad. Sci. 1995;92(18):8259–8263. - PMC - PubMed
1. Bradley P, Misura KM, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005;309:1868–1871. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

Affiliations

Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources