Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009:5541:31-45.
doi: 10.1007/978-3-642-02008-7_3.

Boosting Protein Threading Accuracy

Boosting Protein Threading Accuracy

Jian Peng et al. Res Comput Mol Biol. 2009.

Abstract

Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy.This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.

PubMed Disclaimer

Figures

Algorithm 1
Algorithm 1
Gradient Tree Boosting Training Algorithm
Fig. 1
Fig. 1
The alignment accuracy of the models on the validation data set during the whole training process

References

    1. Kihara D, Skolnick J. The PDB is a covering set of small protein structures. Journal of Molecular Biology. 2003;334(4):793–802. - PubMed
    1. Zhang Y, Skolnick J. The protein structure prediction problem could be solved using the current PDB library. Proceedings of National Academy Sciences, USA. 2005;102(4):1029–1034. - PMC - PubMed
    1. Jones DT. Progress in protein structure prediction. Current Opinion in Structural Biology. 1997;7(3):377–387. - PubMed
    1. Rost B. Twilight zone of protein sequence alignments. Protein Engineering. 1999;12:85–94. - PubMed
    1. John B, Sali A. Comparative protein structure modeling by iterative alignment model building and model assessment. Nucleic Acids Research. 2003;31(14):3982–3992. - PMC - PubMed

LinkOut - more resources