isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection
- PMID: 29183738
- DOI: 10.1016/j.artmed.2017.11.003
isGPT: An optimized model to identify sub-Golgi protein types using SVM and Random Forest based feature selection
Abstract
The Golgi Apparatus (GA) is a key organelle for protein synthesis within the eukaryotic cell. The main task of GA is to modify and sort proteins for transport throughout the cell. Proteins permeate through the GA on the ER (Endoplasmic Reticulum) facing side (cis side) and depart on the other side (trans side). Based on this phenomenon, we get two types of GA proteins, namely, cis-Golgi protein and trans-Golgi protein. Any dysfunction of GA proteins can result in congenital glycosylation disorders and some other forms of difficulties that may lead to neurodegenerative and inherited diseases like diabetes, cancer and cystic fibrosis. So, the exact classification of GA proteins may contribute to drug development which will further help in medication. In this paper, we focus on building a new computational model that not only introduces easy ways to extract features from protein sequences but also optimizes classification of trans-Golgi and cis-Golgi proteins. After feature extraction, we have employed Random Forest (RF) model to rank the features based on the importance score obtained from it. After selecting the top ranked features, we have applied Support Vector Machine (SVM) to classify the sub-Golgi proteins. We have trained regression model as well as classification model and found the former to be superior. The model shows improved performance over all previous methods. As the benchmark dataset is significantly imbalanced, we have applied Synthetic Minority Over-sampling Technique (SMOTE) to the dataset to make it balanced and have conducted experiments on both versions. Our method, namely, identification of sub-Golgi Protein Types (isGPT), achieves accuracy values of 95.4%, 95.9% and 95.3% for 10-fold cross-validation test, jackknife test and independent test respectively. According to different performance metrics, isGPT performs better than state-of-the-art techniques. The source code of isGPT, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/isGPT.
Keywords: Classification; Random Forest; Regression; Sub-Golgi Apparatus; Support vector machine.
Copyright © 2017 Elsevier B.V. All rights reserved.
Similar articles
-
A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.Int J Mol Sci. 2016 Feb 6;17(2):218. doi: 10.3390/ijms17020218. Int J Mol Sci. 2016. PMID: 26861308 Free PMC article.
-
MFSC: Multi-voting based feature selection for classification of Golgi proteins by adopting the general form of Chou's PseAAC components.J Theor Biol. 2019 Feb 21;463:99-109. doi: 10.1016/j.jtbi.2018.12.017. Epub 2018 Dec 15. J Theor Biol. 2019. PMID: 30562500
-
DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16. J Theor Biol. 2018. PMID: 29753757
-
Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.J Theor Biol. 2017 Dec 21;435:208-217. doi: 10.1016/j.jtbi.2017.09.018. Epub 2017 Sep 20. J Theor Biol. 2017. PMID: 28941868 Review.
-
Maintenance of Golgi apparatus structure in the face of continuous protein recycling to the endoplasmic reticulum: making ends meet.Int Rev Cytol. 2005;244:69-94. doi: 10.1016/S0074-7696(05)44002-4. Int Rev Cytol. 2005. PMID: 16157178 Review.
Cited by
-
Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics.Toxins (Basel). 2025 Feb 9;17(2):78. doi: 10.3390/toxins17020078. Toxins (Basel). 2025. PMID: 39998095 Free PMC article. Review.
-
Artificial Intelligence (AI) in Rare Diseases: Is the Future Brighter?Genes (Basel). 2019 Nov 27;10(12):978. doi: 10.3390/genes10120978. Genes (Basel). 2019. PMID: 31783696 Free PMC article. Review.
-
Identification of sub-Golgi protein localization by use of deep representation learning features.Bioinformatics. 2021 Apr 5;36(24):5600-5609. doi: 10.1093/bioinformatics/btaa1074. Bioinformatics. 2021. PMID: 33367627 Free PMC article.
-
GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion.BMC Genomics. 2024 Oct 30;25(1):1019. doi: 10.1186/s12864-024-10954-3. BMC Genomics. 2024. PMID: 39478465 Free PMC article.
-
Machine Learning-Based Epileptic Seizure Detection Methods Using Wavelet and EMD-Based Decomposition Techniques: A Review.Sensors (Basel). 2021 Dec 20;21(24):8485. doi: 10.3390/s21248485. Sensors (Basel). 2021. PMID: 34960577 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials