Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework
- PMID: 30351377
- PMCID: PMC6954445
- DOI: 10.1093/bib/bby079
Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework
Abstract
As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.
Keywords: Light Gradient Boosting Machine; computational prediction; ensemble learning; feature encoding methods; lysine malonylation; machine learning.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Figures







Similar articles
-
Analysis and review of techniques and tools based on machine learning and deep learning for prediction of lysine malonylation sites in protein sequences.Database (Oxford). 2024 Jan 19;2024:baad094. doi: 10.1093/database/baad094. Database (Oxford). 2024. PMID: 38245002 Free PMC article. Review.
-
Computational prediction of species-specific malonylation sites via enhanced characteristic strategy.Bioinformatics. 2017 May 15;33(10):1457-1463. doi: 10.1093/bioinformatics/btw755. Bioinformatics. 2017. PMID: 28025199
-
Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration : Malonylation site prediction.BMC Genomics. 2020 Nov 23;21(1):812. doi: 10.1186/s12864-020-07166-w. BMC Genomics. 2020. PMID: 33225896 Free PMC article.
-
Predicting lysine-malonylation sites of proteins using sequence and predicted structural features.J Comput Chem. 2018 Aug 15;39(22):1757-1763. doi: 10.1002/jcc.25353. Epub 2018 May 14. J Comput Chem. 2018. PMID: 29761520
-
Large-scale comparative assessment of computational predictors for lysine post-translational modification sites.Brief Bioinform. 2019 Nov 27;20(6):2267-2290. doi: 10.1093/bib/bby089. Brief Bioinform. 2019. PMID: 30285084 Free PMC article. Review.
Cited by
-
Current computational tools for protein lysine acylation site prediction.Brief Bioinform. 2024 Sep 23;25(6):bbae469. doi: 10.1093/bib/bbae469. Brief Bioinform. 2024. PMID: 39316944
-
FSL-Kla: A few-shot learning-based multi-feature hybrid system for lactylation site prediction.Comput Struct Biotechnol J. 2021 Aug 10;19:4497-4509. doi: 10.1016/j.csbj.2021.08.013. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34471495 Free PMC article.
-
Identifying sarcopenia in advanced non-small cell lung cancer patients using skeletal muscle CT radiomics and machine learning.Thorac Cancer. 2020 Sep;11(9):2650-2659. doi: 10.1111/1759-7714.13598. Epub 2020 Aug 6. Thorac Cancer. 2020. PMID: 32767522 Free PMC article.
-
Mal-Light: Enhancing Lysine Malonylation Sites Prediction Problem Using Evolutionary-based Features.IEEE Access. 2020;8:77888-77902. doi: 10.1109/access.2020.2989713. Epub 2020 Apr 22. IEEE Access. 2020. PMID: 33354488 Free PMC article.
-
Analysis and review of techniques and tools based on machine learning and deep learning for prediction of lysine malonylation sites in protein sequences.Database (Oxford). 2024 Jan 19;2024:baad094. doi: 10.1093/database/baad094. Database (Oxford). 2024. PMID: 38245002 Free PMC article. Review.
References
-
- Gallego M, Virshup DM. Post-translational modifications regulate the ticking of the circadian clock. Nat Rev Mol Cell Biol 2007;8:139–48. - PubMed
-
- Westermann S, Weber K. Post-translational modifications regulate microtubule function. Nat Rev Mol Cell Biol 2003;4:938–47. - PubMed
-
- Harmel R, Fiedler D. Features and regulation of non-enzymatic post-translational modifications. Nat Chem Biol 2018;14:244–52. - PubMed
-
- Johnson LN. The regulation of protein phosphorylation. Biochem Soc Trans 2009;37:627–41. - PubMed
-
- Ambler RP, Rees MW. Epsilon-N-Methyl-lysine in bacterial flagellar protein. Nature 1959;183:1654–5. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials