Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec;87(12):1092-1099.
doi: 10.1002/prot.25779. Epub 2019 Jul 27.

Prediction of interresidue contacts with DeepMetaPSICOV in CASP13

Affiliations

Prediction of interresidue contacts with DeepMetaPSICOV in CASP13

Shaun M Kandathil et al. Proteins. 2019 Dec.

Abstract

In this article, we describe our efforts in contact prediction in the CASP13 experiment. We employed a new deep learning-based contact prediction tool, DeepMetaPSICOV (or DMP for short), together with new methods and data sources for alignment generation. DMP evolved from MetaPSICOV and DeepCov and combines the input feature sets used by these methods as input to a deep, fully convolutional residual neural network. We also improved our method for multiple sequence alignment generation and included metagenomic sequences in the search. We discuss successes and failures of our approach and identify areas where further improvements may be possible. DMP is freely available at: https://github.com/psipred/DeepMetaPSICOV.

Keywords: deep learning; machine learning; metagenomics; neural networks; protein contact prediction; protein structure prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Architecture of the DeepMetaPSICOV residual neural network model. On the left, the overall organization of the model is shown, beginning with the inputs, and ending in the final sigmoid output layer. The numbers in parentheses represent the dimensionality of the output from each layer in the format (number of feature channels, width, height). The network takes in input features for a protein of length L and produces correspondingly sized output. Most of the model is comprised of 18 residual blocks (denoted ResBlock; only a few are shown), and the structure of each block is shown on the right. The convolutional layers (Conv2D) in a residual block have 5 × 5 filters with a dilation rate d. The values of d for each residual block in the model are given in Supplementary Table S2
Figure 2
Figure 2
The data augmentation procedures used during the training of DeepMetaPSICOV. (A) Deletions in loops can be simulated by probabilistically removing rows and columns in the input tensors and contact maps corresponding to residues classified as loops by DSSP. The DSSP assignment for an example protein is shown above its contact map, with blue rectangles representing alpha helices, and line segments representing loops. (B) Input tensors generated using different alignments can be linearly interpolated to produce new training examples, simulating inputs generated from alignments of varying quality. Inputs thus generated for a given protein are mapped to the same contact maps. (C) New examples are generated by flipping the input feature tensors and contact maps by 180°, corresponding to a reversal of the chain direction
Figure 3
Figure 3
(A) Comparison of effective sequence count (M eff) between alignments generated using only HHblits, or HHblits and jackHMMER. In the latter case, the jackHMMER search makes use of UniRef100 and EBI MGnify metagenomic protein sequences. (B) Plot of top‐L/5 long‐range precision values obtained using the deeper alignments vs those obtained using HHblits only. Using the deeper alignments was beneficial overall, although there are a few domains for which just the HHblits alignment would have provided much higher precision; these are marked
Figure 4
Figure 4
Gap fraction per column in the MSA generated for target T1021s3 (3112 raw sequences, M eff = 979). Official domain boundaries are shaded in light blue and brown, and the precision obtained by DMP on these domains (long‐range, top‐L/5) is shown. The region of the MSA covering the C‐terminal domain D2 is comprised mostly of gaps and thus has little to no information content. Consequently, the obtained contact precision on this domain is much lower than that obtained for D1
Figure 5
Figure 5
Impact of incorrect mutual information (MI) calculations on top‐L/5 long‐range contact precision. Values are expressed as percentage point differences, with positive values indicating a gain in precision upon using the correct MI calculation

References

    1. Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate De novo prediction of protein contact map by ultra‐deep learning model. PLoS Comput Biol. 2017;13(1):e1005324. - PMC - PubMed
    1. Wang S, Sun S, Xu J. Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins: Structure, Function, and Bioinformatics. 2018;86(S1):67‐77. - PMC - PubMed
    1. Adhikari B, Hou J, Cheng J. DNCON2: improved protein contact prediction using two‐level deep convolutional neural networks. Bioinformatics. 2017;34(9):1466‐1472. - PMC - PubMed
    1. Liu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing evolutionary couplings with deep convolutional neural networks. Cell Systems. 2018;6(1):65‐74.e63. - PMC - PubMed
    1. Buchan DWA, Jones DT. Contact predictions with the MetaPSICOV2 server in CASP12. Proteins: Structure, Function and Bioinformatics. 2018;86(S1):78‐83. - PMC - PubMed

Publication types

LinkOut - more resources