. 2016 Sep;84 Suppl 1(Suppl 1):247-59.

doi: 10.1002/prot.24924. Epub 2015 Sep 29.

Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

Renzhi Cao¹, Debswapna Bhattacharya¹, Badri Adhikari¹, Jilong Li¹, Jianlin Cheng^{2

3}

Affiliations

¹ Department of Computer Science, University of Missouri, Columbia, Missouri, 65211.
² Department of Computer Science, University of Missouri, Columbia, Missouri, 65211. chengji@missouri.edu.
³ Informatics Institute, University of Missouri, Columbia, Missouri, 65211. chengji@missouri.edu.

PMID: 26369671
PMCID: PMC4792798
DOI: 10.1002/prot.24924

Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

Renzhi Cao et al. Proteins. 2016 Sep.

. 2016 Sep;84 Suppl 1(Suppl 1):247-59.

doi: 10.1002/prot.24924. Epub 2015 Sep 29.

Authors

Renzhi Cao¹, Debswapna Bhattacharya¹, Badri Adhikari¹, Jilong Li¹, Jianlin Cheng^{2

3}

Affiliations

¹ Department of Computer Science, University of Missouri, Columbia, Missouri, 65211.
² Department of Computer Science, University of Missouri, Columbia, Missouri, 65211. chengji@missouri.edu.
³ Informatics Institute, University of Missouri, Columbia, Missouri, 65211. chengji@missouri.edu.

PMID: 26369671
PMCID: PMC4792798
DOI: 10.1002/prot.24924

Abstract

Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc.

Keywords: CASP; integration; model quality assessment; protein structure prediction; template-based modeling.

PubMed Disclaimer

Conflict of interest statement

The authors declare there is no conflict of interest.

Figures

**Figure 1. Workflow of MULTICOM large-scale model quality assessment method**
Predicted models are ranked by different QA methods followed by a consensus ranking and at the same time models are clustered based on structural similarity into groups. For diversity, top 5 ranks of consensus results are updated using clustering information and the corresponding models further refined using model combination approach.

**Figure 2. Performance of MULTICOM and server predictors with respect to number of residues in domain**
Relationships between number of residues in domain and the median accuracies are shown for these metrics: (a) GDT-HA, (b) SphereGrinder, (c) RMSD, (d) lDDT, (e) MolProbity and (f) GDC. MULTICOM and server predictors are represented by different style and color with the corresponding legends shown on the top-right.

**Figure 3. Performance of MULTICOM and server predictors with respect to difficulty of target**
Relationships between the percentage of sequence identity between the target and best template present in Protein Data Bank after optimal structural superposition and the median accuracies are shown for these metrics: (a) GDT-HA, (b) SphereGrinder, (c) RMSD, (d) lDDT, (e) MolProbity and (f) GDC. MULTICOM and server predictors are represented by different style and color with the corresponding legends shown on the top.

**Figure 4. Accuracy of MULTICOM compared to other server predictors**
First models submitted by MULTICOM compared to median of the first models submitted by server predictors, first models submitted by MULTICOM compared to best models submitted by server predictors and best of five models submitted by MULTICOM compared to best models submitted by server predictors are shown for these metrics: (a) GDT-HA, (b) SphereGrinder, (c) RMSD, (d) lDDT, (e) MolProbity and (f) GDC. The dotted gray line represents the diagonal.

**Figure 5. Case study for CASP11 targets T0853-D1 and T0830-D1**
Quartile plots of Z-scores for all the submitted models are shown for six different quality metrics are shown for targets (a) T0853-D1 and (b) T0830-D1. The maximum and minimum Z-scores for each metric indicated by black down triangle and black up triangle respectively while five models submitted by MULTICOM are highlighted as red, orange, blue, green and cyan, in ascending order. For each target, the experimental structure is shown in the top left (rainbow colored from N terminal to C terminal) while the best prediction by MULTICOM (optimally superposed with experimental structure and translated) is shown in the top right.

**Figure 6. Comparison of MULTICOM with individual QA methods**
Relationships between median GDT-HA score of the server predictors and the GDT-HA of the top model selected by individual QA methods along with MULTICOM are shown. Individual QA methods are represented by different style and color while the curved lines are tendency lines constructed by fitting second-degree polynomial to the data. The corresponding legends are shown on the top left.

**Figure 7. Landscape of MULTICOM’s ranking**
Gaussian kernel density estimates of GDT-HA score of models in the server pool and their ranking by MULTICOM are shown for targets (a) T0822-D1 and (b) T0838-D1 with lower rank indicating model predicted to be of higher quality. The top models selected by each of the QA methods are highlighted by different style and color. The corresponding legends are shown on the right.

See this image and copyright information in PMC

References

1. Eisenhaber F, Persson B, Argos P. Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Critical reviews in biochemistry and molecular biology. 1995;30(1):1–94. - PubMed
1. Rost B. Protein structure prediction in 1D, 2D, and 3D. The Encyclopaedia of Computational Chemistry. 1998;3:2242–2255.
1. Anfinsen CB, Haber E, Sela M, White F., Jr The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proceedings of the National Academy of Sciences of the United States of America. 1961;47(9):1309. - PMC - PubMed
1. Floudas C. Computational methods in protein structure prediction. Biotechnology and bioengineering. 2007;97(2):207–213. - PubMed
1. Shah M, Passovets S, Kim D, Ellrott K, Wang L, Vokler I, LoCascio P, Xu D, Xu Y. A computational pipeline for protein structure prediction and analysis at genome scale. Bioinformatics. 2003;19(15):1985. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM093123/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

Affiliations

Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources