Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 30:8:572195.
doi: 10.3389/fcell.2020.572195. eCollection 2020.

Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites

Affiliations

Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites

Hongfei Wang et al. Front Cell Dev Biol. .

Abstract

Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction.

Keywords: convolutional neural network; deep learning; plant; transfer learning; ubiquitylation; word embedding.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Schematic diagram of the workflow for this study.
FIGURE 2
FIGURE 2
Word2vec training process of the bigram pattern.
FIGURE 3
FIGURE 3
Proposed deep structure for ubiquitination site prediction model.
FIGURE 4
FIGURE 4
Comparison of the amino acid composition (AAC) features between three species. (A) Comparison of AAC features between positive and negative samples of plants. (B) Comparison of AAC features between positive and negative samples of plants. (C) Comparison of AAC features between positive and negative samples of plants.
FIGURE 5
FIGURE 5
Heatmaps for the amino acid pairwise composition (AAPC) features of three species. (A) Heatmap for the AAPC features of plants. (B) Heatmap for the AAPC features of animals. (C) Heatmap for the AAPC features of fungi.
FIGURE 6
FIGURE 6
Heatmaps for the positional weighted matrix (PWM) features of three species. (A) Heatmaps for the PWM features of plants. (B) Heatmaps for the PWM features of animals. (C) Heatmaps for the PWM features of fungi.
FIGURE 7
FIGURE 7
Comparison of the Two Sample Logo of three species. (A) Two Sample Logo of positive samples between plants and animals. (B) Two Sample Logo of positive samples between plants and fungi. (C) Two Sample Logo of positive samples between animals and fungi.
FIGURE 8
FIGURE 8
ROC curve of the different methods on the testing set.

Similar articles

Cited by

References

    1. Cai Y., Huang T., Hu L., Shi X., Xie L., Li Y. (2012). Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 42 1387–1395. 10.1007/s00726-011-0835-0 - DOI - PubMed
    1. Chen Z., Zhou Y., Song J., Zhang Z. (2013). hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim. Biophys. Acta 1834 1461–1467. 10.1016/j.bbapap.2013.04.006 - DOI - PubMed
    1. Du L., Wang Y., Song G., Lu Z., Wang J. (2018). “Dynamic network embedding: an extended approach for skip-gram based network embedding,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, New York, NY: IJCAI, 2086–2092.
    1. Fu H., Yang Y., Wang X., Wang H., Xu Y. (2019). DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins. BMC Bioinformatics 20:86. 10.1186/s12859-019-2677-9 - DOI - PMC - PubMed
    1. Hamid M.-N., Friedberg I. (2018). Identifying antimicrobial peptides using word embedding with deep recurrent neural networks. Bioinformatics 35 2009–2016. 10.1093/bioinformatics/bty937 - DOI - PMC - PubMed

LinkOut - more resources