Toward the solution of the protein structure prediction problem
- PMID: 34119522
- PMCID: PMC8254035
- DOI: 10.1016/j.jbc.2021.100870
Toward the solution of the protein structure prediction problem
Abstract
Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Keywords: contact map; deep learning; distance prediction; end-to-end structure prediction; free modeling; multiple sequence alignment; protein structure prediction; template-based modeling;.
Copyright © 2021 The Authors. Published by Elsevier Inc. All rights reserved.
Conflict of interest statement
Conflict of interest The authors declare that they have no conflicts of interest with the contents of this article.
Figures
References
-
- Anfinsen C.B. Principles that govern folding of protein chains. Science. 1973;181:223–230. - PubMed
-
- Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., Smith H.O., Yandell M., Evans C.A., Holt R.A., Gocayne J.D., Amanatides P., Ballew R.M., Huson D.H., Wortman J.R. The sequence of the human genome. Science. 2001;291:1304–1351. - PubMed
-
- Metzker M.L. Sequencing technologies - the next generation. Nat. Rev. Genet. 2010;11:31–46. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
