Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification
- PMID: 15376912
- DOI: 10.1109/tnb.2003.820284
Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification
Abstract
The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.
Similar articles
-
Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers.IEEE Trans Nanobioscience. 2009 Mar;8(1):100-10. doi: 10.1109/TNB.2009.2016488. Epub 2009 Mar 10. IEEE Trans Nanobioscience. 2009. PMID: 19278932
-
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2. BMC Bioinformatics. 2007. PMID: 17570145 Free PMC article.
-
Feature selection and combination criteria for improving accuracy in protein structure prediction.IEEE Trans Nanobioscience. 2007 Jun;6(2):186-96. doi: 10.1109/tnb.2007.897482. IEEE Trans Nanobioscience. 2007. PMID: 17695755
-
Biological applications of support vector machines.Brief Bioinform. 2004 Dec;5(4):328-38. doi: 10.1093/bib/5.4.328. Brief Bioinform. 2004. PMID: 15606969 Review.
-
Sequence-based protein superfamily classification using computational intelligence techniques: a review.Int J Data Min Bioinform. 2015;11(4):424-57. doi: 10.1504/ijdmb.2015.067957. Int J Data Min Bioinform. 2015. PMID: 26336668 Review.
Cited by
-
Intelligent screening systems for cervical cancer.ScientificWorldJournal. 2014;2014:810368. doi: 10.1155/2014/810368. Epub 2014 May 11. ScientificWorldJournal. 2014. PMID: 24955419 Free PMC article.
-
A protein structural study based on the centrality analysis of protein sequence feature networks.PLoS One. 2021 Mar 29;16(3):e0248861. doi: 10.1371/journal.pone.0248861. eCollection 2021. PLoS One. 2021. PMID: 33780482 Free PMC article.
-
An empirical study of different approaches for protein classification.ScientificWorldJournal. 2014;2014:236717. doi: 10.1155/2014/236717. Epub 2014 Jun 15. ScientificWorldJournal. 2014. PMID: 25028675 Free PMC article.
-
A Tool Preference Choice Method for RNA Secondary Structure Prediction by SVM with Statistical Tests.Evol Bioinform Online. 2013 Apr 14;9:163-84. doi: 10.4137/EBO.S10580. Print 2013. Evol Bioinform Online. 2013. PMID: 23641141 Free PMC article.
-
Building multiclass classifiers for remote homology detection and fold recognition.BMC Bioinformatics. 2006 Oct 16;7:455. doi: 10.1186/1471-2105-7-455. BMC Bioinformatics. 2006. PMID: 17042943 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials