. 2020 Feb 15;36(4):1091-1098.

doi: 10.1093/bioinformatics/btz679.

Analysis of several key factors influencing deep learning-based inter-residue contact prediction

Tianqi Wu¹, Jie Hou¹, Badri Adhikari², Jianlin Cheng¹

Affiliations

¹ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA.
² Department of Mathematics and Computer Science, University of Missouri, St. Louis, MO 63121, USA.

PMID: 31504181
PMCID: PMC7703788
DOI: 10.1093/bioinformatics/btz679

Analysis of several key factors influencing deep learning-based inter-residue contact prediction

Tianqi Wu et al. Bioinformatics. 2020.

. 2020 Feb 15;36(4):1091-1098.

doi: 10.1093/bioinformatics/btz679.

Authors

Tianqi Wu¹, Jie Hou¹, Badri Adhikari², Jianlin Cheng¹

Affiliations

¹ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA.
² Department of Mathematics and Computer Science, University of Missouri, St. Louis, MO 63121, USA.

PMID: 31504181
PMCID: PMC7703788
DOI: 10.1093/bioinformatics/btz679

Abstract

Motivation: Deep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated.

Results: We analyzed the results of our three deep learning-based contact prediction methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy. We compared our convolutional neural network (CNN)-based contact prediction methods with three coevolution-based methods on 75 CASP13 targets consisting of 108 domains. We demonstrated that the CNN-based multi-distance approach was able to leverage global coevolutionary coupling patterns comprised of multiple correlated contacts for more accurate contact prediction than the local coevolution-based methods, leading to a substantial increase of precision by 19.2 percentage points. We also tested different alignment methods and domain-based contact prediction with the deep learning contact predictors. The comparison of the three methods showed deeper sequence alignments and the integration of domain-based contact prediction with the full-length contact prediction improved the performance of contact prediction. Moreover, we demonstrated that the domain-based contact prediction based on a novel ab initio approach of parsing domains from MSAs alone without using known protein structures was a simple, fast approach to improve contact prediction. Finally, we showed that predicting the distribution of inter-residue distances in multiple distance intervals could capture more structural information and improve binary contact prediction.

Availability and implementation: https://github.com/multicom-toolbox/DNCON2/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Contact prediction performance of MULTICOM-NOVEL and CCMpred. (a) ROC curve of CCMpred on the long-range predicted contacts of 43 CASP13 FM and FM/TBM targets are shown in green and MULTICOM-NOVEL in red. Deep learning-based method, MULTICOM-NOVEL, greatly improves the AUC score from 0.61 to 0.84. (b) The plot of the average distance of false positive contact predictions made by MULTICOM-NOVEL versus CCMpred for each CASP13 FM and FM/TBM target (denoted by a dot in the plot). The average distance of false positive contacts over all the targets for MULTICOM-NOVEL is 14.1 Å, smaller than that for CCMpred (17.8 Å)

**Fig. 2.**
Contact prediction results of CCMpred and MULTICOM-NOVEL for target T0953s2. (a) Top 2L long-range contacts predicted by the two methods (red) versus true contacts (blue); (b) ROC curves of the two methods (red: MULTICOM-NOVEL, AUC = 0.95; green: CCMpred, AUC = 0.81); (c) The coverage (i.e. 100 * TP/N, where, TP is number of true positive contacts and N is number of native contacts) of top 5 to top 2L long-range contacts predicted by the two methods. (d) The plot of precision of predicted top 5, top L/10, top L/5, top L/2, top L and top 2L long-range contacts of the two methods. (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 3.**
(a) Plot of contact prediction precision against Neff of multiple sequence alignments for 108 CASP13 domains for MULTICOM-NOVEL. Dots with different colors represent precisions of different numbers of long-range contact predictions (top L/5, top L/2 and top L). The curve is the LOESS line fitting the dots. The plot in Neff range [1, 2500] is zoomed in. (b) Scatterplot of the precision of top L long-range contact predictions versus log (Neff) with the marginal histograms of the precision and log (Neff) shown on the top and on the right, respectively. The curve is the LOESS line fitting the dots. (Color version of this figure is available at *Bioinformatics* online.)

**Fig. 4.**
Domain parsing and domain-based contact prediction of target T0989. (a) Plot of number of sequences in the MSA of T0989 against residue positions, true domain boundaries and the boundaries predicted by the *ab initio* domain parsing method. (b) The contact prediction precision for the second domain of T0989 by MULTICOM-CLUSTER with/without the domain parsing and integration of domain-based contact prediction

**Fig. 5.**
Top L/2 long-range predicted contacts for T0963-D3 at Stage 1 without the inter-residue distance distribution as input and at Stage 2 with the inter-residue distance distribution as input. (a) Top L/2 long-range predicted contacts (red) versus true contracts (blue) for T0963-D3 at Stage 1 at distance thresholds of 6, 7.5, 8.5 and 10 Å. (b) Top L/2 long-range contacts versus true contacts at the distance threshold of 8.0 Å at Stage 1 and Stage 2. (c) The predicted top L/5 long-range contacts at the distance threshold of 8.0 Å at Stage 1 and Stage 2 are visualized on the native structure of target T0963-D3. The red lines in the structure are the false positive contacts and the black lines are true positive contacts. (Color version of this figure is available at *Bioinformatics* online.)

See this image and copyright information in PMC

Cited by

How much metagenome data is needed for protein structure prediction: The advantages of targeted approach from the ecological and evolutionary perspectives.
Yang P, Ning K. Yang P, et al. Imeta. 2022 Mar 6;1(1):e9. doi: 10.1002/imt2.9. eCollection 2022 Mar. Imeta. 2022. PMID: 38867727 Free PMC article. Review.
Improving Protein Secondary Structure Prediction by Deep Language Models and Transformer Networks.
Wu T, Cheng W, Cheng J. Wu T, et al. Methods Mol Biol. 2025;2867:43-53. doi: 10.1007/978-1-0716-4196-5_3. Methods Mol Biol. 2025. PMID: 39576574
Deep learning methods in protein structure prediction.
Torrisi M, Pollastri G, Le Q. Torrisi M, et al. Comput Struct Biotechnol J. 2020 Jan 22;18:1301-1310. doi: 10.1016/j.csbj.2019.12.011. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32612753 Free PMC article. Review.
Tertiary structure assessment at CASP15.
Simpkin AJ, Mesdaghi S, Sánchez Rodríguez F, Elliott L, Murphy DL, Kryshtafovych A, Keegan RM, Rigden DJ. Simpkin AJ, et al. Proteins. 2023 Dec;91(12):1616-1635. doi: 10.1002/prot.26593. Epub 2023 Sep 25. Proteins. 2023. PMID: 37746927 Free PMC article.
Accurate structure prediction of biomolecular interactions with AlphaFold 3.
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O'Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. Abramson J, et al. Nature. 2024 Jun;630(8016):493-500. doi: 10.1038/s41586-024-07487-w. Epub 2024 May 8. Nature. 2024. PMID: 38718835 Free PMC article.

See all "Cited by" articles

References

1. Adhikari B., Cheng J. (2018) CONFOLD2: improved contact-driven ab initio protein structure modeling. BMC Bioinformatics, 19, 22. - PMC - PubMed
1. Adhikari B. et al. (2016) ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinformatics, 17, 517.. - PMC - PubMed
1. Adhikari B. et al. (2018) DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics, 34, 1466–1472. - PMC - PubMed
1. Altschuh D. et al. (1988) Coordinated amino acid changes in homologous protein families. Protein Eng., 2, 193–199. - PubMed
1. Brunger A.T. et al. (1998) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr., 54 (Pt 5), 905–921. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM093123/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Analysis of several key factors influencing deep learning-based inter-residue contact prediction

Affiliations

Analysis of several key factors influencing deep learning-based inter-residue contact prediction

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources