Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 17:6:31571.
doi: 10.1038/srep31571.

Sorting protein decoys by machine-learning-to-rank

Affiliations

Sorting protein decoys by machine-learning-to-rank

Xiaoyang Jing et al. Sci Rep. .

Abstract

Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The overall flowchart of the proposed methods.
Figure 2
Figure 2. The ROC curves of compared methods on the 3DRobot dataset based on GDT_TS score.
The ModFOLDclust2 is a clustering method, other compared methods are listed in “feature extraction” section.
Figure 3
Figure 3. The ROC curves of compared methods on the CASP11 dataset based on GDT_TS score.
(a) The ROC curves for Best150 dataset and (b) the corresponding AUCs for Select20 dataset.

Similar articles

Cited by

References

    1. Moult J., Fidelis K., Kryshtafovych A., Schwede T. & Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round x. Proteins: Structure, Function, and Bioinformatics 82, 1–6 (2014). - PMC - PubMed
    1. Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol 19, 145–155, 10.1016/j.sbi.2009.02.005 (2009). - DOI - PMC - PubMed
    1. Cozzetto D., Kryshtafovych A., Ceriani M. & Tramontano A. Assessment of predictions in the model quality assessment category. Proteins 69 Suppl 8, 175–183, 10.1002/prot.21669 (2007). - DOI - PubMed
    1. Kryshtafovych A. et al. Assessment of the assessment: Evaluation of the model quality estimates in CASP10. Proteins: Structure, Function, and Bioinformatics 82, 112–126, 10.1002/prot.24347 (2014). - DOI - PMC - PubMed
    1. Kryshtafovych A. et al. Methods of model accuracy estimation can help selecting the best models from decoy sets: assessment of model accuracy estimations in CASP11. Proteins: Structure, Function, and Bioinformatics (2015). - PMC - PubMed