Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 18;20(1):86.
doi: 10.1186/s12859-019-2677-9.

DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins

Affiliations

DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins

Hongli Fu et al. BMC Bioinformatics. .

Abstract

Background: Protein ubiquitination occurs when the ubiquitin protein binds to a target protein residue of lysine (K), and it is an important regulator of many cellular functions, such as signal transduction, cell division, and immune reactions, in eukaryotes. Experimental and clinical studies have shown that ubiquitination plays a key role in several human diseases, and recent advances in proteomic technology have spurred interest in identifying ubiquitination sites. However, most current computing tools for predicting target sites are based on small-scale data and shallow machine learning algorithms.

Results: As more experimentally validated ubiquitination sites emerge, we need to design a predictor that can identify lysine ubiquitination sites in large-scale proteome data. In this work, we propose a deep learning predictor, DeepUbi, based on convolutional neural networks. Four different features are adopted from the sequences and physicochemical properties. In a 10-fold cross validation, DeepUbi obtains an AUC (area under the Receiver Operating Characteristic curve) of 0.9, and the accuracy, sensitivity and specificity exceeded 85%. The more comprehensive indicator, MCC, reaches 0.78. We also develop a software package that can be freely downloaded from https://github.com/Sunmile/DeepUbi .

Conclusion: Our results show that DeepUbi has excellent performance in predicting ubiquitination based on large data.

Keywords: Convolutional neural networks; Deep learning; Ubiquitination.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
ROC curves of different cross-validations. ROC curves and their AUC values of 4-, 6-, 8-, and 10-fold cross validations with the One-Hot encoding scheme
Fig. 2
Fig. 2
ROC curves of different feature constructions. ROC curves and their AUC values of four features in the 10-fold cross validation. These curves are very close to each other which illustrate the robustness of the model
Fig. 3
Fig. 3
Different sequence analysis charts about ubiquitination and non-ubiquitination peptides. a A bar chart to compare the number of flanking amino acids surrounding the ubiquitination and non-ubiquitination peptides. b A circular chart to compare the percentage of flanking amino acids surrounding the ubiquitination and non-ubiquitination peptides. c Two Sample Logos web-server to calculate and visualize differences between ubiquitination and non-ubiquitination peptides
Fig. 4
Fig. 4
Flow chart of the data collection and processing. Firstly, collecting the raw proteins and then removing the redundant protein sequences with CD-Hit; secondly, intercepting the protein sequences with a 31 sliding window to get the positive and negative fragments; at last, using 30% identity in negative samples to get a balanced training data
Fig. 5
Fig. 5
a Flow chart of the CNN deep learning model. b An example of convolution-pooling structure. a Input a fragment and encode; construct an embedding layer; build multi-convolution-pooling layers; construct fully connected layers; and then get the output. b Use different filters with different sizes to get a series of feature maps; and then use a max-pooling and concatenating together to form a feature vector. Finally, the softmax function regularization is used to get the classification

References

    1. Goldstein G, Scheid M, Hammerling U, Schlesinger DH, Niall HD, Boyse EA. Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. Proc Natl Acad Sci U S A. 1975;72(1):11–15. - PMC - PubMed
    1. Wilkinson KD. Protein ubiquitination: a regulatory post-translational modification. Anticancer Drug Des. 1987;2(2):211–229. - PubMed
    1. Ou CY, Pi HW, Chien CT. Control of protein degradation by E3 ubiquitin ligases in Drosophila eye development. Trends Genet. 2003;19(7):382–389. - PubMed
    1. Herrmann J, Lerman LO, Lerman A. Ubiquitin and ubiquitin-like proteins in protein regulation. Circ Res. 2007;100(9):1276–1291. - PubMed
    1. Welchman R, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol. 2005;6(8):599–609. - PubMed