Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 15;9(9):e106542.
doi: 10.1371/journal.pone.0106542. eCollection 2014.

Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms

Affiliations

Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms

Balachandran Manavalan et al. PLoS One. .

Abstract

Recently, predicting proteins three-dimensional (3D) structure from its sequence information has made a significant progress due to the advances in computational techniques and the growth of experimental structures. However, selecting good models from a structural model pool is an important and challenging task in protein structure prediction. In this study, we present the first application of random forest based model quality assessment (RFMQA) to rank protein models using its structural features and knowledge-based potential energy terms. The method predicts a relative score of a model by using its secondary structure, solvent accessibility and knowledge-based potential energy terms. We trained and tested the RFMQA method on CASP8 and CASP9 targets using 5-fold cross-validation. The correlation coefficient between the TM-score of the model selected by RFMQA (TMRF) and the best server model (TMbest) is 0.945. We benchmarked our method on recent CASP10 targets by using CASP8 and 9 server models as a training set. The correlation coefficient and average difference between TMRF and TMbest over 95 CASP10 targets are 0.984 and 0.0385, respectively. The test results show that our method works better in selecting top models when compared with other top performing methods. RFMQA is available for download from http://lee.kias.re.kr/RFMQA/RFMQA_eval.tar.gz.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Five-fold cross-validation on CASP8 and CASP9 targets.
TM-score of the best server model (TMbest) versus TM-score of the model selected by RFMQA (TMRF) for five-fold validation is shown. Pearson's correlation coefficient and the average TMloss between TMbest and TMRF are 0.945 and 0.055, respectively.
Figure 2
Figure 2. Pairwise comparisons.
TMRF against TM-score of the model selected by individual statistical potential (TMQA) is shown; (A) dDFIRE versus RFMQA, (B) RWplus versus RFMQA, (C) OPUS versus RFMQA, (D) GOAP versus RFMQA, and (E) DFIRE versus RFMQA.
Figure 3
Figure 3. Evaluation of RFMQA on CASP10 targets and its pairwise comparison with other potential energies.
(A) TMRF versus TMbest. Pearson's correlation coefficient and the average TMloss between TMRF and TMbest are 0.984 and 0.039, respectively, (B) dDFIRE versus RFMQA, (C) RWplus versus RFMQA, (D) OPUS versus RFMQA, (E) GOAP versus RFMQA, and (F) DFIRE versus RFMQA.
Figure 4
Figure 4. Comparison of RFMQA with top QA methods on CASP10 models.
(A) GOAP versus RFMQA, (B) ProQ2 versus RFMQA, (C) MULTICOM-CONSTRUCT versus RFMQA, (D) ModFOLDclust2 versus RFMQA, (E) PMS versus RFMQA, and (F) Pcons versus RFMQA.
Figure 5
Figure 5. Examples of good predictions by RFMQA are shown for (A) T0698 and (B) T0715.
Models selected by RFMQA (magenta) and ModFOLDclust2 (green) are shown as superposed against the TMbest model (cyan).
Figure 6
Figure 6. Examples of bad predictions by RFMQA are shown for (A) T0700 and (B) T0742.
Models selected by RFMQA (magenta) is shown as superposed against the TMbest model (cyan).
Figure 7
Figure 7. Distribution of Z-score for the model selection on CASP10 targets.
Z<0 is colored in red; 0≤Z<1 is colored in green; 1≤Z<2 is colored in blue; 2≤Z<3 is colored in magenta and Z≥3 is colored in cyan.

References

    1. Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294: 93–96. - PubMed
    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921. - PubMed
    1. Kihara D, Chen H, Yang YD (2009) Quality assessment of protein structure models. Curr Protein Pept Sci 10: 216–228. - PubMed
    1. Kryshtafovych A, Venclovas C, Fidelis K, Moult J (2005) Progress over the first decade of CASP experiments. Proteins 61 Suppl 7225–236. - PubMed
    1. Moult J (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15: 285–289. - PubMed

Publication types

LinkOut - more resources