Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 19:2021:baab039.
doi: 10.1093/database/baab039.

Automatization and self-maintenance of the O-GlcNAcome catalog: a smart scientific database

Affiliations

Automatization and self-maintenance of the O-GlcNAcome catalog: a smart scientific database

Florian Malard et al. Database (Oxford). .

Abstract

Post-translational modifications (PTMs) are ubiquitous and essential for protein function and signaling, motivating the need for sustainable benefit and open models of web databases. Highly conserved O-GlcNAcylation is a case example of one of the most recently discovered PTMs, investigated by a growing community. Historically, details about O-GlcNAcylated proteins and sites were dispersed across literature and in non-O-GlcNAc-focused, rapidly outdated or now defunct web databases. In a first effort to fill the gap, we recently published a human O-GlcNAcome catalog with a basic web interface. Based on the enthusiasm generated by this first resource, we extended our O-GlcNAcome catalog to include data from 42 distinct organisms and released the O-GlcNAc Database v1.2. In this version, more than 14 500 O-GlcNAcylated proteins and 11 000 O-GlcNAcylation sites are referenced from the curation of 2200 publications. In this article, we also present the extensive features of the O-GlcNAc Database, including the user-friendly interface, back-end and client-server interactions. We particularly emphasized our workflow, involving a mostly automatized and self-maintained database, including machine learning approaches for text mining. We hope that this software model will be useful beyond the O-GlcNAc community, to set up new smart, scientific online databases, in a short period of time. Indeed, this database system can be administrated with little to no programming skills and is meant to be an example of a useful, sustainable and cost-efficient resource, which exclusively relies on free open-source software elements (www.oglcnac.mcw.edu).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
O-GlcNAcylation of proteins. A single β-N-acetyl-glucosamine residue is added by the O-GlcNAc transferase (OGT) and removed by the O-GlcNAcase (OGA). The hexosamine biosynthesis pathway drives the production of the O-GlcNAc nucleotide donor (e.g., UDP-GlcNAc) from glucose (Glc). Serine or threonine (S/T) is targeted for modification on intracellular proteins.
Figure 2.
Figure 2.
Number of O-GlcNAcylated proteins and sites. O-GlcNAcylated proteins (blue) and O-GlcNAc sites (orange) for each organism cataloged in the O-GlcNAc Database. The Others category summarized proteins and sites from 26 organisms with <10 protein entries.
Figure 3.
Figure 3.
Example of search results for the protein Histone H3.1 in the O-GlcNAc Database. Protein entries are shown as collapsible elements (1), and child elements can be accessed on click (dashed frame). Nested collapsible provides digest tools (2) in full (3) and partial (4) modes, download (5) and comment options (6).
Figure 4.
Figure 4.
Training of the neural network. Top panel: scheme for learning rate cycling during training of independent models. Bottom panel: for each independent model, accuracy was monitored along training epochs for training subsets and testing set.
Figure 5.
Figure 5.
Example of literature report upon logistic binary classifier and automatization routines. The private interface contains the literature item metadata (Authors, Title, Year, Journal, Volume, Issue, Abstract, PMID and PubMed Link) (1) as well as the prediction score from the neural network (2). Neural network decisions are presented for each model ((2) right brackets). Decisions are then average and binomial confidence interval is calculated ((2) left). In (3), extracted proteins, sites, species and methods are shown next to the number of iteration for each item. This information is complemented by sentences associated with combinations of tags relevant to O-GlcNAcylated proteins (4). An update window is available for rapid update of the master update file upon inspection of each publication (5).
Figure 6.
Figure 6.
Unified Modeling Language (UML) (11) activity diagram (act) of the O-GlcNAc automatization and self-maintenance library. Initial state (black circle), actions (rounded rectangle), list objects (rectangle), fork and join (bold bars), decision and merge (diamond), break (crossed circle) as well as final state (black circle) are highlighted per UML conventions. Normal (green) and error (red) completion actions are also highlighted, together with actions for which specific activity diagrams for O-GlcNAc sites quality control (right panel) and for collection of information related to protein and literature are given (Figure S6).

References

    1. Pagel O., Loroch S., Sickmann A.. et al. (2015) Current strategies and findings in clinically relevant post-translational modification-specific proteomics. Expert Rev. Proteomics, 12, 235–253. - PMC - PubMed
    1. Walsh G. and Jefferis R. (2006) Post-translational modifications in the context of therapeutic proteins. Nat. Biotechnol., 24, 1241–1252. - PubMed
    1. Bond M.R. and Hanover J.A. (2013) O-GlcNAc cycling: a link between metabolism and chronic disease. Annu. Rev. Nutr., 33, 205–229.doi: 10.1146/annurev-nutr-071812-161240. - DOI - PMC - PubMed
    1. Hart G.W. (2014) Three decades of research on O-GlcNAcylation - a major nutrient sensor that regulates signaling, transcription and cellular metabolism. Front Endocrinol. (Lausanne), 5, 183.doi: 10.3389/fendo.2014.00183. - DOI - PMC - PubMed
    1. Akan I., Olivier-Van Stichelen S., Bond M.R.. et al. (2018) Nutrient-driven O-GlcNAc in proteostasis and neurodegeneration. J. Neurochem., 144, 7–34.doi: 10.1111/jnc.14242. - DOI - PMC - PubMed

Publication types