Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection
- PMID: 18378524
- DOI: 10.1093/bioinformatics/btn112
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection
Abstract
Motivation: The problems of protein fold recognition and remote homology detection have recently attracted a great deal of interest as they represent challenging multi-feature multi-class problems for which modern pattern recognition methods achieve only modest levels of performance. As with many pattern recognition problems, there are multiple feature spaces or groups of attributes available, such as global characteristics like the amino-acid composition (C), predicted secondary structure (S), hydrophobicity (H), van der Waals volume (V), polarity (P), polarizability (Z), as well as attributes derived from local sequence alignment such as the Smith-Waterman scores. This raises the need for a classification method that is able to assess the contribution of these potentially heterogeneous object descriptors while utilizing such information to improve predictive performance. To that end, we offer a single multi-class kernel machine that informatively combines the available feature groups and, as is demonstrated in this article, is able to provide the state-of-the-art in performance accuracy on the fold recognition problem. Furthermore, the proposed approach provides some insight by assessing the significance of recently introduced protein features and string kernels. The proposed method is well-founded within a Bayesian hierarchical framework and a variational Bayes approximation is derived which allows for efficient CPU processing times.
Results: The best performance which we report on the SCOP PDB-40D benchmark data-set is a 70% accuracy by combining all the available feature groups from global protein characteristics but also including sequence-alignment features. We offer an 8% improvement on the best reported performance that combines multi-class k-nn classifiers while at the same time reducing computational costs and assessing the predictive power of the various available features. Furthermore, we examine the performance of our methodology on the SCOP 1.53 benchmark data-set that simulates remote homology detection and examine the combination of various state-of-the-art string kernels that have recently been proposed.
Similar articles
-
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2. BMC Bioinformatics. 2007. PMID: 17570145 Free PMC article.
-
Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25. Comput Biol Med. 2011. PMID: 21704312
-
Mismatch string kernels for discriminative protein classification.Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22. Bioinformatics. 2004. PMID: 14990442
-
A comprehensive review and comparison of different computational methods for protein remote homology detection.Brief Bioinform. 2018 Mar 1;19(2):231-244. doi: 10.1093/bib/bbw108. Brief Bioinform. 2018. PMID: 27881430 Review.
-
Sequence-based protein superfamily classification using computational intelligence techniques: a review.Int J Data Min Bioinform. 2015;11(4):424-57. doi: 10.1504/ijdmb.2015.067957. Int J Data Min Bioinform. 2015. PMID: 26336668 Review.
Cited by
-
Using machine learning to quantify structural MRI neurodegeneration patterns of Alzheimer's disease into dementia score: Independent validation on 8,834 images from ADNI, AIBL, OASIS, and MIRIAD databases.Hum Brain Mapp. 2020 Oct 1;41(14):4127-4147. doi: 10.1002/hbm.25115. Epub 2020 Jul 2. Hum Brain Mapp. 2020. PMID: 32614505 Free PMC article.
-
Characterising the grey matter correlates of leukoaraiosis in cerebral small vessel disease.Neuroimage Clin. 2015 Aug 13;9:194-205. doi: 10.1016/j.nicl.2015.07.002. eCollection 2015. Neuroimage Clin. 2015. PMID: 26448913 Free PMC article.
-
Decoding Covert Speech From EEG-A Comprehensive Review.Front Neurosci. 2021 Apr 29;15:642251. doi: 10.3389/fnins.2021.642251. eCollection 2021. Front Neurosci. 2021. PMID: 33994922 Free PMC article. Review.
-
Development and validation of a novel dementia of Alzheimer's type (DAT) score based on metabolism FDG-PET imaging.Neuroimage Clin. 2018 Mar 10;18:802-813. doi: 10.1016/j.nicl.2018.03.007. eCollection 2018. Neuroimage Clin. 2018. PMID: 29876266 Free PMC article.
-
DISCOVER: a feature-based discriminative method for motif search in complex genomes.Bioinformatics. 2009 Jun 15;25(12):i321-9. doi: 10.1093/bioinformatics/btp230. Bioinformatics. 2009. PMID: 19478006 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials