GlyGen data model and processing workflow

Robel Kahsay¹, Jeet Vora¹, Rahi Navelkar¹, Reza Mousavi¹, Brian C Fochtman¹, Xavier Holmes¹, Nagarajan Pattabiraman¹, Rene Ranzinger², Rupali Mahadik², Tatiana Williamson², Sujeet Kulkarni², Gaurav Agarwal², Maria Martin³, Preethi Vasudev³, Leyla Garcia⁴, Nathan Edwards⁵, Wenjin Zhang⁵, Darren A Natale⁵, Karen Ross⁵, Kiyoko F Aoki-Kinoshita⁶, Matthew P Campbell⁷, William S York², Raja Mazumder¹

Affiliations

¹ Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC 20052, USA.
² Complex Carbohydrate Research Center, The University of Georgia, Athens, GA 30602, USA.
³ European Bioinformatics Institute, Hinxton CB10 1SD, UK.
⁴ ZB MED Information Centre for Life Sciences, Cologne 50931, Germany.
⁵ Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, DC 20007, USA.
⁶ Faculty of Science and Engineering, Soka University, Tokyo 192-8577, Japan.
⁷ Institute for Glycomics Griffith University, Southport QLD 4222, Australia.

PMID: 32324859
PMCID: PMC7320628
DOI: 10.1093/bioinformatics/btaa238

GlyGen data model and processing workflow

Robel Kahsay et al. Bioinformatics. 2020.

. 2020 Jun 1;36(12):3941-3943.

doi: 10.1093/bioinformatics/btaa238.

Authors

Affiliations

¹ Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC 20052, USA.
² Complex Carbohydrate Research Center, The University of Georgia, Athens, GA 30602, USA.
³ European Bioinformatics Institute, Hinxton CB10 1SD, UK.
⁴ ZB MED Information Centre for Life Sciences, Cologne 50931, Germany.
⁵ Department of Biochemistry and Molecular & Cellular Biology, Georgetown University, Washington, DC 20007, USA.
⁶ Faculty of Science and Engineering, Soka University, Tokyo 192-8577, Japan.
⁷ Institute for Glycomics Griffith University, Southport QLD 4222, Australia.

PMID: 32324859
PMCID: PMC7320628
DOI: 10.1093/bioinformatics/btaa238

Abstract

Summary: Glycoinformatics plays a major role in glycobiology research, and the development of a comprehensive glycoinformatics knowledgebase is critical. This application note describes the GlyGen data model, processing workflow and the data access interfaces featuring programmatic use case example queries based on specific biological questions. The GlyGen project is a data integration, harmonization and dissemination project for carbohydrate and glycoconjugate-related data retrieved from multiple international data sources including UniProtKB, GlyTouCan, UniCarbKB and other key resources.

Availability and implementation: GlyGen web portal is freely available to access at https://glygen.org. The data portal, web services, SPARQL endpoint and GitHub repository are also freely available at https://data.glygen.org, https://api.glygen.org, https://sparql.glygen.org and https://github.com/glygener, respectively. All code is released under license GNU General Public License version 3 (GNU GPLv3) and is available on GitHub https://github.com/glygener. The datasets are made available under Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
GlyGen data processing workflow showing various steps. Data are retrieved from various resources including UniProtKB, GlyTouCan, UniCarbKB, RefSeq and other key resources, followed by extraction and filtering based on relevance to glycobiology. Extracted data are integrated after harmonization that is based on various standard ontologies. The resulting datasets are then ingested into a MongoDB docstore and Virtuoso triplestore using the GlyGen data model

See this image and copyright information in PMC

References

1. Altenhoff A.M. et al. (2018) The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Res., 46, D477–D485. - PMC - PubMed
1. Alterovitz G. et al. (2018) Enabling precision medicine via standard communication of HTS provenance, analysis, and results. PLoS Biol., 16, e3000099. - PMC - PubMed
1. Berman H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
1. Bult C.J. et al.; The Mouse Genome Database Group. (2019) Mouse Genome Database (MGD) 2019. Nucleic Acids Res., 47, D801–D806. - PMC - PubMed
1. Campbell M.P. et al. (2014) UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res., 42, D215–D221. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

U01 GM125267/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GlyGen data model and processing workflow

Affiliations

GlyGen data model and processing workflow

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources