Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
- PMID: 33102477
- PMCID: PMC7554246
- DOI: 10.3389/fcell.2020.572195
Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
Abstract
Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction.
Keywords: convolutional neural network; deep learning; plant; transfer learning; ubiquitylation; word embedding.
Copyright © 2020 Wang, Wang, Li and Lee.
Figures








Similar articles
-
Mini-review: Recent advances in post-translational modification site prediction based on deep learning.Comput Struct Biotechnol J. 2022 Jun 30;20:3522-3532. doi: 10.1016/j.csbj.2022.06.045. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 35860402 Free PMC article. Review.
-
UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z. BMC Syst Biol. 2016. PMID: 26818456 Free PMC article.
-
UPFPSR: a ubiquitylation predictor for plant through combining sequence information and random forest.Math Biosci Eng. 2022 Jan;19(1):775-791. doi: 10.3934/mbe.2022035. Epub 2021 Nov 22. Math Biosci Eng. 2022. PMID: 34903012
-
UbiComb: A Hybrid Deep Learning Model for Predicting Plant-Specific Protein Ubiquitylation Sites.Genes (Basel). 2021 May 11;12(5):717. doi: 10.3390/genes12050717. Genes (Basel). 2021. PMID: 34064731 Free PMC article.
-
Analysis and review of techniques and tools based on machine learning and deep learning for prediction of lysine malonylation sites in protein sequences.Database (Oxford). 2024 Jan 19;2024:baad094. doi: 10.1093/database/baad094. Database (Oxford). 2024. PMID: 38245002 Free PMC article. Review.
Cited by
-
Identifying Pupylation Proteins and Sites by Incorporating Multiple Methods.Front Endocrinol (Lausanne). 2022 Apr 26;13:849549. doi: 10.3389/fendo.2022.849549. eCollection 2022. Front Endocrinol (Lausanne). 2022. PMID: 35557849 Free PMC article.
-
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction.Methods Mol Biol. 2022;2499:285-322. doi: 10.1007/978-1-0716-2317-6_15. Methods Mol Biol. 2022. PMID: 35696087
-
Residue-Residue Contact Can Be a Potential Feature for the Prediction of Lysine Crotonylation Sites.Front Genet. 2022 Jan 4;12:788467. doi: 10.3389/fgene.2021.788467. eCollection 2021. Front Genet. 2022. PMID: 35058968 Free PMC article.
-
Mini-review: Recent advances in post-translational modification site prediction based on deep learning.Comput Struct Biotechnol J. 2022 Jun 30;20:3522-3532. doi: 10.1016/j.csbj.2022.06.045. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 35860402 Free PMC article. Review.
-
Current status of PTMs structural databases: applications, limitations and prospects.Amino Acids. 2022 Apr;54(4):575-590. doi: 10.1007/s00726-021-03119-z. Epub 2022 Jan 12. Amino Acids. 2022. PMID: 35020020 Review.
References
-
- Du L., Wang Y., Song G., Lu Z., Wang J. (2018). “Dynamic network embedding: an extended approach for skip-gram based network embedding,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, New York, NY: IJCAI, 2086–2092.
LinkOut - more resources
Full Text Sources