Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 11;37(19):3190-3196.
doi: 10.1093/bioinformatics/btab355.

Improving deep learning-based protein distance prediction in CASP14

Affiliations

Improving deep learning-based protein distance prediction in CASP14

Zhiye Guo et al. Bioinformatics. .

Abstract

Motivation: Accurate prediction of residue-residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment.

Results: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps.

Availability and implementation: The software package, source code and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The overall pipeline of the MULTICOM distance predictors based on DeepDist2. The two data flows (branches) applied to all the targets are connected by the black solid line, while the optional flow (branch) is connected by the red dotted line, which is only invoked when the MSAs are produced by DeepMSA and DeepAln are not sufficiently deep. Each flow (branch) produces four sets of features (COV_Set, PRE_Set, PLM_Set and OTHER_Set; see details in Section 2.2), each of which is used as input for a deep network to predict a distance map. The four distance maps predicted from the four sets of features of each branch are averaged as the predicted distance map of the branch. The final prediction is the average of the predicted distance maps of the first two or all the three branches
Fig. 2.
Fig. 2.
A plot of precisions of top L/2 long-range contact predictions against the average probabilities of the top L/2 predicted contacts. MULTICOM-CONSTRUCT with HHblits_BFD alignments were used to predict the distance maps
Fig. 3.
Fig. 3.
Comparison of the domain-based distance prediction and the full-length distance prediction with true distance map of T1052-D3. In the subfigure on the left, the upper triangle denotes the domain-based distance prediction and the lower triangle the true distance map. In the figure on the right, the upper triangle denotes the full-length distance prediction, and the lower triangle the true distance map. The patterns in the domain-based distance prediction map are much clear and closer to the true distance map than the full-length distance prediction map
Fig. 4.
Fig. 4.
(A) The distanced map predicted from MSAs generated by DeepAln and DeepMSA with predicted domain information (upper triangle) versus the true distance map (lower triangle). (B) The distance map predicted from the HHblits_BFD MSA without domain information (upper triangle) versus true distance map (lower triangle). (C) The predicted distance map from the HHblits_BFD MSA with predicted domain information (upper triangle) versus the true distance map (lower triangle)

Similar articles

Cited by

References

    1. Adhikari B., Cheng J. (2018) CONFOLD2: improved contact-driven ab initio protein structure modeling. BMC Bioinformatics, 19, 22. - PMC - PubMed
    1. Adhikari B. et al. (2016) ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinformatics, 17, 1–12. - PMC - PubMed
    1. Berman H.M. et al. (2000) The protein data bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
    1. Bhagwat M., Aravind A.L. (2007) PSI-BLAST tutorial. In: Bergman N.H. (eds) Comparative Genomics, Methods in Molecular Biology™. Humana Press. Springer, Vol. 395, pp. 177–186. - PMC - PubMed
    1. Brünger A.T. et al. (1998) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. Sect. D Biol. Crystallogr., 54, 905–921. - PubMed