Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 24;6(1):65-74.e3.
doi: 10.1016/j.cels.2017.11.014. Epub 2017 Dec 20.

Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

Affiliations

Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

Yang Liu et al. Cell Syst. .

Abstract

While genes are defined by sequence, in biological systems a protein's function is largely determined by its three-dimensional structure. Evolutionary information embedded within multiple sequence alignments provides a rich source of data for inferring structural constraints on macromolecules. Still, many proteins of interest lack sufficient numbers of related sequences, leading to noisy, error-prone residue-residue contact predictions. Here we introduce DeepContact, a convolutional neural network (CNN)-based approach that discovers co-evolutionary motifs and leverages these patterns to enable accurate inference of contact probabilities, particularly when few related sequences are available. DeepContact significantly improves performance over previous methods, including in the CASP12 blind contact prediction task where we achieved top performance with another CNN-based approach. Moreover, our tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent metric to assess contact prediction across diverse proteins. Through substantially improving the precision-recall behavior of contact prediction, DeepContact suggests we are near a paradigm shift in template-free modeling for protein structure prediction.

Keywords: co-evolution; contact prediction; convolutional neural networks; deep learning; evolutionary couplings; protein structure prediction; structure prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Overview of DeepContact
(A) Structure of the full-feature DeepContact model. DeepContact takes in global, 1D, and 2D features calculated from the amino acid sequence, including evolutionary couplings, and uses a CNN to predict contacts. (B) DeepContact trains using a set of solved structures, taking in the distance matrix (left) and, as a preprocessing step, producing a contact matrix using an 8 Å threshold (right). Intrinsically, these contact maps have patterns, and clearly some matrices cannot be contact maps. By learning the structure of contact matrices and the relationship between couplings and contacts, DeepContact is able to vastly improve evolutionary- based contact prediction.
Figure 2
Figure 2. Improved Performance of DeepContact on Benchmark Datasets
(A) DeepContact outperforms CCMPred on the ASTRAL validation set using only CCMPred as features. Including other features further improves the precision-recall performance. (B) Precision-recall performance of contact-prediction methods on the CAMEO dataset. DeepContact further outperforms metaPSICOV on the CAMEO dataset. (C) Precision-recall performance of contact-prediction methods on the CASP228 dataset. On all three validation sets, using our novel probability cutoff enables enhancement of the precision/recall characteristics of DeepContact. Effectively, we exclude sequences or contacts with little confidence, and include contacts in which we have more confidence, leading to improved performance. (D) The improved contact-prediction performance of DeepContact over CCMPred leads to improved contact-assisted folding across the CASP12 free-modeling target set. Targets where folding failed using CCMPred contacts are plotted at a TM-score of 0.
Figure 3
Figure 3. DeepContact Integrates Local Information to Improve Contact Prediction
(A) Example of how DeepContact (upper right triangle) improves contact prediction over CCMPred (lower left triangle) input for PDB: 3LRT_A (Cherney et al., 2011). Most of the DeepContact “false-positives” border regions of true positives, and the two that do not (black circles) are true homodimer contacts between chain A (green) and chain B (cyan) separated by 3.6 and 4.3 Å, respectively (right panel). See also Figure S1. (B) DeepContact integrates local information. Using CCMPred as the only input feature (top left), DeepContact is able to identify patterns indicative of secondary structure elements (bottom left). Using additional features sharpens the predicted contact map (top right). These matrices resemble the experimentally determined distance matrix (bottom right).
Figure 4
Figure 4. DeepContact Reranks Full Contact Distribution
(A) Contacts were ranked across the entire ASTRAL validation set based on distance (black), DeepContact probability (blue), and CCMPred score (red). To make CCMPred comparable across examples we normalized to the SD of the medium- and long-range scores within each protein. The x axis is the rank-ordered list of DeepContact probabilities, and the y axis the average distance of the higher-ranked contacts. DeepContact (blue) significantly improves the rank order of contacts across the distribution compared with CCMPred, being much closer to the true rank order of contacts (black). (B) The average distance of the false positives for each of the ASTRAL validation set structures as called by CCMPred versus as called by DeepContact. The “false positives” called by DeepContact are significantly closer in the experimental structure, with many of them lying just beyond the 8 Å cutoff.
Figure 5
Figure 5. DeepContact Converts Evolutionary Coupling Scores to Coupling Probabilities
Boxplot of precision of DeepContact with respect to the ASTRAL validation set (y axis) with DeepContact predictions binned by 0.01 probability. Mean (red) and median (blue) precision are shown for each bin; whiskers represent 5th to 95th percentiles. We trained DeepContact using a cross-entropy loss function, which effectively maximizes the ability to distinguish residue pairs less than 8 Å apart from residues more than 8 Å apart. While the probabilities are better calibrated at the ends of the distribution, those in the middle enable an objective understanding of the likely probability of contacts using the output probabilities.
Figure 6
Figure 6. Visualization of First-Layer Filters
Visualization of features picked up by the first layer of a DeepContact model trained with CCMPred as the only feature (STAR Methods). We averaged the top 100 activations of each filter across the ASTRAL validation set and used t-stochastic neighbor embedding to reduce the dimensionality (center gray-shaded matrix). Insets (A–F) show the activation patterns of selected filters, as well as the top 5 structural alignments and sequence similarity of the proteins with the top 100 activations. Filters (center) cluster by secondary structure element, spanning from β segments (red, top) to helical/β to helical segments (blue, bottom). The β patterns fit with the alternating contacts of β sheets (A, B, and F) and distinguish between parallel (A and B) and antiparallel (F) sheets. Helical filter (E) shows a grid-pattern separated by three to four residues, matching the rise of a helix.

References

    1. Adhikari B, Bhattacharya D, Cao R, Cheng J. CONFOLD: residue-residue contact-guided ab initio protein folding. Proteins. 2015;83:1436–1449. - PMC - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. - PMC - PubMed
    1. Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol. Syst. Biol. 2016;12:878. - PMC - PubMed
    1. Berger B, Wilson DB, Wolf E, Tonchev T, Milla M, Kim PS. Predicting coiled coils by use of pairwise residue correlations. Proc. Natl. Acad. Sci. 1995;92(18):8259–8263. - PMC - PubMed
    1. Bradley P, Misura KM, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005;309:1868–1871. - PubMed