Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 15;31(12):i116-23.
doi: 10.1093/bioinformatics/btv235.

Large-scale model quality assessment for improving protein tertiary structure prediction

Affiliations

Large-scale model quality assessment for improving protein tertiary structure prediction

Renzhi Cao et al. Bioinformatics. .

Abstract

Motivation: Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well.

Results: Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling.

Availability and implementation: The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The workflow of the MULTICOM method comprised of six steps. (1) A pool of tertiary structure models is predicted for a target protein. (2) Models are scored and ranked by different QA methods. (3) Models are clustered into groups based on structural similarity. (4) The consensus of individual QA rankings and other information are synthesized to generate the final ranking of all the models. (5) The final ranking and the clustering results are integrated to select top five diverse models for submission. (6) The top five models are combined to generate five refined models to be submitted to CASP11
Fig. 2.
Fig. 2.
Tertiary structure prediction of domain 2 of T0783 (T0783-D2). (A) The superposition of the MULTICOM human TS1 model on domain 2 with the native structure. (B) The distribution of 191 models in the model pool. (C). The plot of the true GDT-TS scores of models against their predicted ranking
Fig. 3.
Fig. 3.
Tertiary structure prediction of domain 1 of T0767 (T0767-D1). (A) The superposition of the MULTICOM human TS1 model on domain 1 with the native structure. (B) The distribution of 195 models in the model pool. (C) The plot of the true GDT-TS scores of models against their predicted ranking
Fig. 4.
Fig. 4.
The plot of the difference between the initial GDT-TS scores before model combination and the GDT-TS scores after model combination against the initial GDT-TS scores of top one models of 42 targets

References

    1. Berman H.M., et al. . (2000) The protein data bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
    1. Bhattacharya D., Cheng J. (2013) 3Drefine: consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins Struct. Funct. Bioinform., 81, 119–131. - PMC - PubMed
    1. Bowie J.U., et al. . (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science, 253, 164–170. - PubMed
    1. Cao R., et al. . (2014a) Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment. BMC Struct. Biol., 14, 13. - PMC - PubMed
    1. Cao R., et al. . (2014b) SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics, 15:120. - PMC - PubMed

Publication types