MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

doi:10.1038/s41598-021-92395-6

. 2021 Jun 23;11(1):13155.

doi: 10.1038/s41598-021-92395-6.

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Tianqi Wu¹, Jian Liu¹, Zhiye Guo¹, Jie Hou², Jianlin Cheng³

Affiliations

¹ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
² Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA.
³ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA. chengji@missouri.edu.

PMID: 34162922
PMCID: PMC8222248
DOI: 10.1038/s41598-021-92395-6

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Tianqi Wu et al. Sci Rep. 2021.

. 2021 Jun 23;11(1):13155.

doi: 10.1038/s41598-021-92395-6.

Authors

Tianqi Wu¹, Jian Liu¹, Zhiye Guo¹, Jie Hou², Jianlin Cheng³

Affiliations

¹ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
² Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA.
³ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA. chengji@missouri.edu.

PMID: 34162922
PMCID: PMC8222248
DOI: 10.1038/s41598-021-92395-6

Abstract

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system-MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
The top 20 server predictors on the 38 CASP14 FM and FM/TBM hard domains ranked by the sum of Z-score calculated according to the CASP14 assessor’s evaluation. The predictors are ranked based on the first models they predicted for 38 CASP14 domains. The predictors from the same group are marked with the same color. The Y-axis denote the sum of the Z-score. The average TM-score of each predictor is reported on top of the bar representing each predictor.

**Figure 2**
The box plots of the quality of top-1 models from MULTICOM2 integrated server predictors and top-1 models from their templated-based modeling branches on CASP14 TBM domains. (A) Comparison between MULTICOM-HYBRID and its templated-based models (MULTICOM-HYBRID_TBM). (B) Comparison between MULTICOM-DEEP and its templated-based models (MULTICOM-DEEP_TBM). Top1 templated-based models are selected based on the same model selection methods of MULTICOM-HYBRID and MULTICOM-DEEP mentioned the “Methods” section.

**Figure 3**
Impact of Neff and the accuracy of the inter-residue distance prediction on the model quality of the MUTLICOM2 system on 91 CASP14 domains whose experimental structures are available for analysis. (A) Logarithm of Neff of MSA vs. the quality of models built from three MULTICOM server predictors (MULTICOM-DEEP, MULTICOM-DIST and MULTICOM-HYBRID). The size of a dot is proportional to the value of Neff. (B) The precision of top L/2 long-range contact predictions vs. the quality of models.

**Figure 4**
Flowchart of the MULTICOM2 system consisting of template-based, template-free modeling methods and model ranking (quality assessment).

See this image and copyright information in PMC

Cited by

Refinement of AlphaFold2 models against experimental and hybrid cryo-EM density maps.
Alshammari M, Wriggers W, Sun J, He J. Alshammari M, et al. QRB Discov. 2022;3:e16. doi: 10.1017/qrd.2022.13. Epub 2022 Sep 20. QRB Discov. 2022. PMID: 37485023 Free PMC article.
Computational epitope-based vaccine design with bioinformatics approach; a review.
Basmenj ER, Pajhouh SR, Ebrahimi Fallah A, Naijian R, Rahimi E, Atighy H, Ghiabi S, Ghiabi S. Basmenj ER, et al. Heliyon. 2025 Jan 4;11(1):e41714. doi: 10.1016/j.heliyon.2025.e41714. eCollection 2025 Jan 15. Heliyon. 2025. PMID: 39866399 Free PMC article. Review.

References

1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
1. Zimmermann L, et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 2018;430:2237–2243. doi: 10.1016/j.jmb.2017.12.007. - DOI - PubMed
1. Senior AW, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577:706–710. doi: 10.1038/s41586-019-1923-7. - DOI - PubMed
1. Xu J. Distance-based protein folding powered by deep learning. Proc. Natl. Acad. Sci. 2019;116:16856–16865. doi: 10.1073/pnas.1821309116. - DOI - PMC - PubMed
1. Greener JG, Kandathil SM, Jones DT. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat. Commun. 2019;10:3977. doi: 10.1038/s41467-019-11994-0. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed

[2] Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed

[3] Zimmermann L, et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 2018;430:2237–2243. doi: 10.1016/j.jmb.2017.12.007. - DOI - PubMed

[4] Zimmermann L, et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 2018;430:2237–2243. doi: 10.1016/j.jmb.2017.12.007. - DOI - PubMed

[5] Senior AW, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577:706–710. doi: 10.1038/s41586-019-1923-7. - DOI - PubMed

[6] Senior AW, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577:706–710. doi: 10.1038/s41586-019-1923-7. - DOI - PubMed

[7] Xu J. Distance-based protein folding powered by deep learning. Proc. Natl. Acad. Sci. 2019;116:16856–16865. doi: 10.1073/pnas.1821309116. - DOI - PMC - PubMed

[8] Xu J. Distance-based protein folding powered by deep learning. Proc. Natl. Acad. Sci. 2019;116:16856–16865. doi: 10.1073/pnas.1821309116. - DOI - PMC - PubMed

[9] Greener JG, Kandathil SM, Jones DT. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat. Commun. 2019;10:3977. doi: 10.1038/s41467-019-11994-0. - DOI - PMC - PubMed

[10] Greener JG, Kandathil SM, Jones DT. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat. Commun. 2019;10:3977. doi: 10.1038/s41467-019-11994-0. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Affiliations

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources