Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 1;36(12):3941-3943.
doi: 10.1093/bioinformatics/btaa238.

GlyGen data model and processing workflow

Affiliations

GlyGen data model and processing workflow

Robel Kahsay et al. Bioinformatics. .

Abstract

Summary: Glycoinformatics plays a major role in glycobiology research, and the development of a comprehensive glycoinformatics knowledgebase is critical. This application note describes the GlyGen data model, processing workflow and the data access interfaces featuring programmatic use case example queries based on specific biological questions. The GlyGen project is a data integration, harmonization and dissemination project for carbohydrate and glycoconjugate-related data retrieved from multiple international data sources including UniProtKB, GlyTouCan, UniCarbKB and other key resources.

Availability and implementation: GlyGen web portal is freely available to access at https://glygen.org. The data portal, web services, SPARQL endpoint and GitHub repository are also freely available at https://data.glygen.org, https://api.glygen.org, https://sparql.glygen.org and https://github.com/glygener, respectively. All code is released under license GNU General Public License version 3 (GNU GPLv3) and is available on GitHub https://github.com/glygener. The datasets are made available under Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
GlyGen data processing workflow showing various steps. Data are retrieved from various resources including UniProtKB, GlyTouCan, UniCarbKB, RefSeq and other key resources, followed by extraction and filtering based on relevance to glycobiology. Extracted data are integrated after harmonization that is based on various standard ontologies. The resulting datasets are then ingested into a MongoDB docstore and Virtuoso triplestore using the GlyGen data model

References

    1. Altenhoff A.M. et al. (2018) The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Res., 46, D477–D485. - PMC - PubMed
    1. Alterovitz G. et al. (2018) Enabling precision medicine via standard communication of HTS provenance, analysis, and results. PLoS Biol., 16, e3000099. - PMC - PubMed
    1. Berman H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
    1. Bult C.J. et al.; The Mouse Genome Database Group. (2019) Mouse Genome Database (MGD) 2019. Nucleic Acids Res., 47, D801–D806. - PMC - PubMed
    1. Campbell M.P. et al. (2014) UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res., 42, D215–D221. - PMC - PubMed

Publication types