Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May 20:9:35.
doi: 10.1186/1472-6807-9-35.

QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information

Affiliations

QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information

Pascal Benkert et al. BMC Struct Biol. .

Abstract

Background: The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus.

Results: Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach.

Conclusion: Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Analysis of the statistical significance based on a one-sided paired t-test (95% confidence level). Green: Method denoted on the horizontal performs significantly better. Red: Method denoted on the horizontal performs significantly worse. a) Pearson's correlation coefficient, b) Spearman's rank correlation coefficient, c) GDT_TS values of the models selected model by a scoring function.
Figure 2
Figure 2
Comparison of QMEAN, a 3d-Jury like approach and QMEANclust on 3 selected CASP7 targets. The table shows the GDT_TS difference between the best select model by QMEANclust and the 3D-jury approach. Correlations between predicted score and GDT_TS of three targets are shown for QMEAN, 3D-jury and QMEANclust (from left to right). The dashed areas mark the models selected by QMEAN as the basis for QMEANclust. The arrow on the right of each plot denotes the best selected model.
Figure 3
Figure 3
Receiver operator characteristic (ROC) curves for the different local QMEAN versions and ProQres. A Cα distance cut-off of 2.5 Å has been used. Two alternative QMEANclust approaches have been tested which combine the local Cα distances using median or weighted mean.

Similar articles

Cited by

References

    1. Simons KT, Kooperberg C, Huang E, Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol. 1997;268:209–225. doi: 10.1006/jmbi.1997.0959. - DOI - PubMed
    1. Zhang Y, Arakaki AK, Skolnick J. TASSER: An automated method for the prediction of protein tertiary structures in CASP6. Proteins: Structure, Function, and Bioinformatics. 2005;61:91–98. doi: 10.1002/prot.20724. - DOI - PubMed
    1. Sommer I, Toppo S, Sander O, Lengauer T, Tosatto SC. Improving the quality of protein structure models by selecting from alignment alternatives. BMC Bioinformatics. 2006;7:364. doi: 10.1186/1471-2105-7-364. - DOI - PMC - PubMed
    1. Saqi MA, Bates PA, Sternberg MJ. Towards an automatic method of predicting protein structure by homology: an evaluation of suboptimal sequence alignments. Protein Eng. 1992;5:305–311. doi: 10.1093/protein/5.4.305. - DOI - PubMed
    1. Cheng J. A multi-template combination algorithm for protein comparative modeling. BMC Struct Biol. 2008;8:18. doi: 10.1186/1472-6807-8-18. - DOI - PMC - PubMed

Publication types

LinkOut - more resources