Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 28:15:120.
doi: 10.1186/1471-2105-15-120.

SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines

Affiliations

SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines

Renzhi Cao et al. BMC Bioinformatics. .

Abstract

Background: It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models.

Results: We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark.

Conclusion: SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The evaluation results of residue-specific local quality predictions of single-model local quality QA tools (SMOQ) on CASP9 single-domain proteins. Basic (20 targets) denotes the SVM model trained using the basic feature set on 20 CASP8 single-domain targets. Basic (85 targets) denotes the SVM model trained using basic feature set on 85 CASP8 single-domain targets. Basic (20 targets, no homologue) denotes the basic model trained on 20 CASP8 single-domain targets, but tested on the CASP9 single-domain targets that are not homologues of CASP8 targets. Profile and profile+SOV denote the two SVM models using profile and profile+SOV feature set that were trained on 20 CASP8 single-domain targets and tested on CASP9 targets without homologue removal. The absolute difference errors of the predictions were plotted against the real distance deviations.
Figure 2
Figure 2
The predicted deviation against real deviation for our basic SVM model and other two local prediction methods (ProQ2 and QMEAN) on 84 CASP9 targets.
Figure 3
Figure 3
The absolute difference error between real and predicted deviation against real deviation for our basic SVM model and ProQ2 and QMEAN.
Figure 4
Figure 4
An example illustrates the real and predicted distances between a model and the native structure. The model is the first model of the MULTICOM-CLUSTER tertiary structure predictor for CASP9 target T0563. (A) The real and predicted distance between the native structure and the model at each amino acid position. (B) The superimposition between the model (green and red) and the native structure (grey). Red highlights the two regions where the model has a relatively large deviation compared with the native structure.

References

    1. Jaravine V, Ibraghimov I, Orekhov V. Removal of a time barrier for high-resolution multidimensional NMR spectroscopy. Nat Methods. 2006;3(8):605–607. doi: 10.1038/nmeth900. - DOI - PubMed
    1. Lattman E. The state of the protein structure initiative. Protein Struct Funct Bioinformatics. 2004;54(4):611–615. doi: 10.1002/prot.20000. - DOI - PubMed
    1. Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294(5540):93–96. doi: 10.1126/science.1065659. - DOI - PubMed
    1. Kryshtafovych A, Fidelis K, Moult J. Progress from CASP6 to CASP7. Protein Struct Funct Bioinformatics. 2007;69(S8):194–207. doi: 10.1002/prot.21769. - DOI - PubMed
    1. Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A. Critical assessment of methods of protein structure prediction—Round VII. Protein Struct Funct Bioinformatics. 2007;69(S8):3–9. doi: 10.1002/prot.21767. - DOI - PMC - PubMed

Publication types

LinkOut - more resources