Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 11;10 Suppl 1(Suppl 1):6.
doi: 10.1186/s12918-015-0246-z.

UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines

Affiliations

UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines

Chien-Hsun Huang et al. BMC Syst Biol. .

Abstract

Background: The conjugation of ubiquitin to a substrate protein (protein ubiquitylation), which involves a sequential process--E1 activation, E2 conjugation and E3 ligation, is crucial to the regulation of protein function and activity in eukaryotes. This ubiquitin-conjugation process typically binds the last amino acid of ubiquitin (glycine 76) to a lysine residue of a target protein. The high-throughput of mass spectrometry-based proteomics has stimulated a large-scale identification of ubiquitin-conjugated peptides. Hence, a new web resource, UbiSite, was developed to identify ubiquitin-conjugation site on lysines based on large-scale proteome dataset.

Results: Given a total of 37,647 ubiquitin-conjugated proteins, including 128,026 ubiquitylated peptides, obtained from various resources, this study carries out a large-scale investigation on ubiquitin-conjugation sites based on sequenced and structural characteristics. A TwoSampleLogo reveals that a significant depletion of histidine (H), arginine (R) and cysteine (C) residues around ubiquitylation sites may impact the conjugation of ubiquitins in closed three-dimensional environments. Based on the large-scale ubiquitylation dataset, a motif discovery tool, MDDLogo, has been adopted to characterize the potential substrate motifs for ubiquitin conjugation. Not only are single features such as amino acid composition (AAC), positional weighted matrix (PWM), position-specific scoring matrix (PSSM) and solvent-accessible surface area (SASA) considered, but also the effectiveness of incorporating MDDLogo-identified substrate motifs into a two-layered prediction model is taken into account. Evaluation by five-fold cross-validation showed that PSSM is the best feature in discriminating between ubiquitylation and non-ubiquitylation sites, based on support vector machine (SVM). Additionally, the two-layered SVM model integrating MDDLogo-identified substrate motifs could obtain a promising accuracy and the Matthews Correlation Coefficient (MCC) at 81.06% and 0.586, respectively. Furthermore, the independent testing showed that the two-layered SVM model could outperform other prediction tools, reaching at 85.10% sensitivity, 69.69% specificity, 73.69% accuracy and the 0.483 of MCC value.

Conclusion: The independent testing result indicated the effectiveness of incorporating MDDLogo-identified motifs into the prediction of ubiquitylation sites. In order to provide meaningful assistance to researchers interested in large-scale ubiquitinome data, the two-layered SVM model has been implemented onto a web-based system (UbiSite), which is freely available at http://csb.cse.yzu.edu.tw/UbiSite/ . Two cases given in the UbiSite provide a demonstration of effective identification of ubiquitylation sites with reference to substrate motifs.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Flowchart of constructing two-layered prediction model based on MDDLogo-identified substrate motifs
Fig. 2
Fig. 2
Sequenced and structural characteristics of ubiquitin-conjugation sites. a Comparison of position-specific amino acid composition between ubiquitylation and non-ubiquitylation sites. b Comparison of solvent-accessible surface area between ubiquitylation and non-ubiquitylation sites. c Distribution of secondary structure around ubiquitylation sites
Fig. 3
Fig. 3
Tree view of MDDLogo-clustered subgroups with statistically significant motifs for 5438 ubiquitylation sites
Fig. 4
Fig. 4
Comparison of independent testing performance between single SVM model and two-layered SVM model
Fig. 5
Fig. 5
Case study of identifying ubiquitylation sites on E3 ubiquitin-protein ligase DMA2 of Saccharomyces cerevisiae

Similar articles

Cited by

References

    1. Goldstein G, Scheid M, Hammerling U, Schlesinger DH, Niall HD, Boyse EA. Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. Proc Natl Acad Sci U S A. 1975;72(1):11–15. doi: 10.1073/pnas.72.1.11. - DOI - PMC - PubMed
    1. Wilkinson KD. The discovery of ubiquitin-dependent proteolysis. Proc Natl Acad Sci U S A. 2005;102(43):15280–15282. doi: 10.1073/pnas.0504842102. - DOI - PMC - PubMed
    1. Pickart CM, Eddins MJ. Ubiquitin: structures, functions, mechanisms. Biochim Biophys Acta. 2004;1695(1–3):55–72. doi: 10.1016/j.bbamcr.2004.09.019. - DOI - PubMed
    1. Welchman RL, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol. 2005;6(8):599–609. doi: 10.1038/nrm1700. - DOI - PubMed
    1. Hurley JH, Lee S, Prag G. Ubiquitin-binding domains. Biochem J. 2006;399:361–372. doi: 10.1042/BJ20061138. - DOI - PMC - PubMed

Publication types

LinkOut - more resources