Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep;84 Suppl 1(Suppl 1):247-59.
doi: 10.1002/prot.24924. Epub 2015 Sep 29.

Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

Affiliations

Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

Renzhi Cao et al. Proteins. 2016 Sep.

Abstract

Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc.

Keywords: CASP; integration; model quality assessment; protein structure prediction; template-based modeling.

PubMed Disclaimer

Conflict of interest statement

The authors declare there is no conflict of interest.

Figures

Figure 1
Figure 1. Workflow of MULTICOM large-scale model quality assessment method
Predicted models are ranked by different QA methods followed by a consensus ranking and at the same time models are clustered based on structural similarity into groups. For diversity, top 5 ranks of consensus results are updated using clustering information and the corresponding models further refined using model combination approach.
Figure 2
Figure 2. Performance of MULTICOM and server predictors with respect to number of residues in domain
Relationships between number of residues in domain and the median accuracies are shown for these metrics: (a) GDT-HA, (b) SphereGrinder, (c) RMSD, (d) lDDT, (e) MolProbity and (f) GDC. MULTICOM and server predictors are represented by different style and color with the corresponding legends shown on the top-right.
Figure 3
Figure 3. Performance of MULTICOM and server predictors with respect to difficulty of target
Relationships between the percentage of sequence identity between the target and best template present in Protein Data Bank after optimal structural superposition and the median accuracies are shown for these metrics: (a) GDT-HA, (b) SphereGrinder, (c) RMSD, (d) lDDT, (e) MolProbity and (f) GDC. MULTICOM and server predictors are represented by different style and color with the corresponding legends shown on the top.
Figure 4
Figure 4. Accuracy of MULTICOM compared to other server predictors
First models submitted by MULTICOM compared to median of the first models submitted by server predictors, first models submitted by MULTICOM compared to best models submitted by server predictors and best of five models submitted by MULTICOM compared to best models submitted by server predictors are shown for these metrics: (a) GDT-HA, (b) SphereGrinder, (c) RMSD, (d) lDDT, (e) MolProbity and (f) GDC. The dotted gray line represents the diagonal.
Figure 5
Figure 5. Case study for CASP11 targets T0853-D1 and T0830-D1
Quartile plots of Z-scores for all the submitted models are shown for six different quality metrics are shown for targets (a) T0853-D1 and (b) T0830-D1. The maximum and minimum Z-scores for each metric indicated by black down triangle and black up triangle respectively while five models submitted by MULTICOM are highlighted as red, orange, blue, green and cyan, in ascending order. For each target, the experimental structure is shown in the top left (rainbow colored from N terminal to C terminal) while the best prediction by MULTICOM (optimally superposed with experimental structure and translated) is shown in the top right.
Figure 6
Figure 6. Comparison of MULTICOM with individual QA methods
Relationships between median GDT-HA score of the server predictors and the GDT-HA of the top model selected by individual QA methods along with MULTICOM are shown. Individual QA methods are represented by different style and color while the curved lines are tendency lines constructed by fitting second-degree polynomial to the data. The corresponding legends are shown on the top left.
Figure 7
Figure 7. Landscape of MULTICOM’s ranking
Gaussian kernel density estimates of GDT-HA score of models in the server pool and their ranking by MULTICOM are shown for targets (a) T0822-D1 and (b) T0838-D1 with lower rank indicating model predicted to be of higher quality. The top models selected by each of the QA methods are highlighted by different style and color. The corresponding legends are shown on the right.

References

    1. Eisenhaber F, Persson B, Argos P. Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Critical reviews in biochemistry and molecular biology. 1995;30(1):1–94. - PubMed
    1. Rost B. Protein structure prediction in 1D, 2D, and 3D. The Encyclopaedia of Computational Chemistry. 1998;3:2242–2255.
    1. Anfinsen CB, Haber E, Sela M, White F., Jr The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proceedings of the National Academy of Sciences of the United States of America. 1961;47(9):1309. - PMC - PubMed
    1. Floudas C. Computational methods in protein structure prediction. Biotechnology and bioengineering. 2007;97(2):207–213. - PubMed
    1. Shah M, Passovets S, Kim D, Ellrott K, Wang L, Vokler I, LoCascio P, Xu D, Xu Y. A computational pipeline for protein structure prediction and analysis at genome scale. Bioinformatics. 2003;19(15):1985. - PubMed

Publication types

LinkOut - more resources