Prediction of protein structural class using novel evolutionary collocation-based sequence representation
- PMID: 18293306
- DOI: 10.1002/jcc.20918
Prediction of protein structural class using novel evolutionary collocation-based sequence representation
Abstract
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.
(c) 2008 Wiley Periodicals, Inc. J Comput Chem, 2008.
Similar articles
-
Prediction of protein structural class for the twilight zone sequences.Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5. Biochem Biophys Res Commun. 2007. PMID: 17433260
-
PFRES: protein fold classification by using evolutionary information and predicted secondary structure.Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17. Bioinformatics. 2007. PMID: 17942446
-
Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile.Biochimie. 2010 Oct;92(10):1330-4. doi: 10.1016/j.biochi.2010.06.013. Epub 2010 Jun 23. Biochimie. 2010. PMID: 20600567
-
Prediction of protein structural classes.Crit Rev Biochem Mol Biol. 1995;30(4):275-349. doi: 10.3109/10409239509083488. Crit Rev Biochem Mol Biol. 1995. PMID: 7587280 Review.
-
An overview on predicting the subcellular location of a protein.In Silico Biol. 2002;2(3):291-303. In Silico Biol. 2002. PMID: 12542414 Review.
Cited by
-
BEAN 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors.Database (Oxford). 2015 Jun 27;2015:bav064. doi: 10.1093/database/bav064. Print 2015. Database (Oxford). 2015. PMID: 26120140 Free PMC article.
-
Proposing a highly accurate protein structural class predictor using segmentation-based features.BMC Genomics. 2014;15 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2164-15-S1-S2. Epub 2014 Jan 24. BMC Genomics. 2014. PMID: 24564476 Free PMC article.
-
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.Comput Math Methods Med. 2015;2015:370756. doi: 10.1155/2015/370756. Epub 2015 Dec 15. Comput Math Methods Med. 2015. PMID: 26788119 Free PMC article.
-
Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model.Comput Math Methods Med. 2020 Nov 10;2020:8858489. doi: 10.1155/2020/8858489. eCollection 2020. Comput Math Methods Med. 2020. PMID: 33224267 Free PMC article.
-
Novel numerical characterization of protein sequences based on individual amino acid and its application.Biomed Res Int. 2015;2015:909567. doi: 10.1155/2015/909567. Epub 2015 Feb 2. Biomed Res Int. 2015. PMID: 25705698 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials