pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset
- PMID: 30451108
- DOI: 10.2174/1381612824666181119145030
pLoc_bal-mPlant: Predict Subcellular Localization of Plant Proteins by General PseAAC and Balancing Training Dataset
Abstract
Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called "pLoc-mPlant" was developed for identifying the subcellular localization of plant proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mPlant was trained by an extremely skewed dataset in which some subsets (i.e., the protein numbers for some subcellular locations) were more than 10 times larger than the others. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. To overcome such biased consequence, we have developed a new and bias-free predictor called pLoc_bal-mPlant by balancing the training dataset. Cross-validation tests on exactly the same experimentconfirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mPlant, the existing state-of-the-art predictor in identifying the subcellular localization of plant proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mPlant/, by which users can easily get their desired results without the need to go through the detailed mathematics.
Keywords: 5-step rules; Chou's intuitive metrics; ML-GKR; Multi-label system; PseAAC; balance treatment; plant proteins..
Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.net.
Similar articles
-
pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset.Med Chem. 2019;15(5):496-509. doi: 10.2174/1573406415666181217114710. Med Chem. 2019. PMID: 30556503
-
pLoc_bal-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by General PseAAC and Quasi-balancing Training Dataset.Med Chem. 2019;15(5):472-485. doi: 10.2174/1573406415666181218102517. Med Chem. 2019. PMID: 30569871
-
pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC.Genomics. 2019 Jul;111(4):886-892. doi: 10.1016/j.ygeno.2018.05.017. Epub 2018 May 26. Genomics. 2019. PMID: 29842950
-
Critical evaluation of web-based prediction tools for human protein subcellular localization.Brief Bioinform. 2020 Sep 25;21(5):1628-1640. doi: 10.1093/bib/bbz106. Brief Bioinform. 2020. PMID: 31697319 Review.
-
Protein subcellular localization prediction tools.Comput Struct Biotechnol J. 2024 Apr 15;23:1796-1807. doi: 10.1016/j.csbj.2024.04.032. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 38707539 Free PMC article. Review.
Cited by
-
iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments via Chou's 5-steps Rule and Pseudo Components.Curr Genomics. 2019 May;20(4):306-320. doi: 10.2174/1389202920666190819091609. Curr Genomics. 2019. PMID: 32030089 Free PMC article.
-
Multilocation proteins in organelle communication: Based on protein-protein interactions.Plant Direct. 2022 Feb 21;6(2):e386. doi: 10.1002/pld3.386. eCollection 2022 Feb. Plant Direct. 2022. PMID: 35229068 Free PMC article.
-
iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC via Chou's 5-steps Rule.Curr Genomics. 2019 May;20(4):275-292. doi: 10.2174/1389202920666190809095206. Curr Genomics. 2019. PMID: 32030087 Free PMC article.
-
Characterization of the relationship between FLI1 and immune infiltrate level in tumour immune microenvironment for breast cancer.J Cell Mol Med. 2020 May;24(10):5501-5514. doi: 10.1111/jcmm.15205. Epub 2020 Apr 5. J Cell Mol Med. 2020. PMID: 32249526 Free PMC article.
-
Crop Proteomics under Abiotic Stress: From Data to Insights.Plants (Basel). 2022 Oct 27;11(21):2877. doi: 10.3390/plants11212877. Plants (Basel). 2022. PMID: 36365330 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources