The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
- PMID: 18554421
- PMCID: PMC2440765
- DOI: 10.1186/1471-2105-9-282
The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
Abstract
Background: Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand.
Results: The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL).
Conclusion: The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.
Figures
Similar articles
-
Combining SVM and ECOC for Identification of Protein Complexes from Protein Protein Interaction Networks by Integrating Amino Acids' Physical Properties and Complex Topology.Interdiscip Sci. 2020 Sep;12(3):264-275. doi: 10.1007/s12539-020-00369-5. Epub 2020 May 21. Interdiscip Sci. 2020. PMID: 32441001
-
Sequence features of DNA binding sites reveal structural class of associated transcription factor.Bioinformatics. 2006 Jan 15;22(2):157-63. doi: 10.1093/bioinformatics/bti731. Epub 2005 Nov 2. Bioinformatics. 2006. PMID: 16267080
-
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2. BMC Bioinformatics. 2007. PMID: 17570145 Free PMC article.
-
Identification of coding and non-coding sequences using local Holder exponent formalism.Bioinformatics. 2005 Oct 15;21(20):3818-23. doi: 10.1093/bioinformatics/bti639. Epub 2005 Aug 23. Bioinformatics. 2005. PMID: 16118261
-
Binary tree of SVM: a new fast multiclass training and classification algorithm.IEEE Trans Neural Netw. 2006 May;17(3):696-704. doi: 10.1109/TNN.2006.872343. IEEE Trans Neural Netw. 2006. PMID: 16722173
Cited by
-
Optical Encoding Model Based on Orbital Angular Momentum Powered by Machine Learning.Sensors (Basel). 2023 Mar 2;23(5):2755. doi: 10.3390/s23052755. Sensors (Basel). 2023. PMID: 36904967 Free PMC article.
-
Transcription factor prediction using protein 3D secondary structures.Bioinformatics. 2024 Dec 26;41(1):btae762. doi: 10.1093/bioinformatics/btae762. Bioinformatics. 2024. PMID: 39786868 Free PMC article.
-
TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.PLoS One. 2013 Dec 12;8(12):e82238. doi: 10.1371/journal.pone.0082238. eCollection 2013. PLoS One. 2013. PMID: 24349230 Free PMC article.
-
Concurrent quantification of proteome and phosphoproteome to reveal system-wide association of protein phosphorylation and gene expression.Mol Cell Proteomics. 2009 Dec;8(12):2809-26. doi: 10.1074/mcp.M900293-MCP200. Epub 2009 Aug 12. Mol Cell Proteomics. 2009. PMID: 19674963 Free PMC article.
-
Incorporating evolutionary information and functional domains for identifying RNA splicing factors in humans.PLoS One. 2011;6(11):e27567. doi: 10.1371/journal.pone.0027567. Epub 2011 Nov 16. PLoS One. 2011. PMID: 22110674 Free PMC article.
References
-
- Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003;31:374–378. doi: 10.1093/nar/gkg108. - DOI - PMC - PubMed
-
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006;34:D108–10. doi: 10.1093/nar/gkj143. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous