Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 21;33(6):454-463.
doi: 10.1093/glycob/cwad028.

Bridging glycoinformatics and cheminformatics: integration efforts between GlyCosmos and PubChem

Affiliations

Bridging glycoinformatics and cheminformatics: integration efforts between GlyCosmos and PubChem

Tiejun Cheng et al. Glycobiology. .

Abstract

The GlyCosmos Glycoscience Portal (https://glycosmos.org) and PubChem (https://pubchem.ncbi.nlm.nih.gov/) are major portals for glycoscience and chemistry, respectively. GlyCosmos is a portal for glycan-related repositories, including GlyTouCan, GlycoPOST, and UniCarb-DR, as well as for glycan-related data resources that have been integrated from a variety of 'omics databases. Glycogenes, glycoproteins, lectins, pathways, and disease information related to glycans are accessible from GlyCosmos. PubChem, on the other hand, is a chemistry-based portal at the National Center for Biotechnology Information. PubChem provides information not only on chemicals, but also genes, proteins, pathways, as well as patents, bioassays, and more, from hundreds of data resources from around the world. In this work, these 2 portals have made substantial efforts to integrate their complementary data to allow users to cross between these 2 domains. In addition to glycan structures, key information, such as glycan-related genes, relevant diseases, glycoproteins, and pathways, was integrated and cross-linked with one another. The interfaces were designed to enable users to easily find, access, download, and reuse data of interest across these resources. Use cases are described illustrating and highlighting the type of content that can be investigated. In total, these integrations provide life science researchers improved awareness and enhanced access to glycan-related information.

Keywords: databases; glycoinformatics; integration; web portal.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Screenshot of the biologics description in PubChem of a glycan from GlyCosmos having PubChem CID 5288898 (https://pubchem.ncbi.nlm.nih.gov/compound/5288898#section=Biologic-Description). By clicking on the green triangle, the link to GlyCosmos will be displayed. All of the various names by which this glycan is called is listed under “Synonyms” along with a text description of the structure at the bottom.
Fig. 2
Fig. 2
Screenshot of the cross-link of PubChem Compound (CID 5288898) to GlycoNAVI, GlyCosmos, and GlyTouCan (https://pubchem.ncbi.nlm.nih.gov/compound/5288898#section=GlyTouCan-Accession). Similar to Fig. 1, by clicking on the green triangle, the links to the corresponding resources appear.
Fig. 3
Fig. 3
Screenshot of the glycans involved in a Reactome pathway (ID: R-HSA-2206; https://pubchem.ncbi.nlm.nih.gov/pathway/Reactome:R-HSA-2022857#section=Glycans). This figure is an example of the glycans involved in Keratan sulfate degradation; thus, the GlyTouCan IDs, names, and figures of the glycans involved (Keratan sulfate structures) are listed. Note that there are also glycans associated with taxonomy (e.g. https://pubchem.ncbi.nlm.nih.gov/taxonomy/9769#section=Glycans), displayed in the same manner.
Fig. 4
Fig. 4
Screenshot of the Keratan sulfate degradation pathway in GlyCosmos. Clicking on the green circles (proteins) will display the glycoproteins and links to their corresponding entry pages for details about the glycosylated proteins and glycans attached to them. Yellow squares representing glycans are linked to the corresponding glycan entry pages, and blue arrows, when corresponding to glycogenes, are linked to the enzyme pages, respectively. In summary, this page allows users to study the glycans attached to a glycoprotein, which is an enzyme that uses a glycan as a substrate, all of which are linked to their detailed entry pages in GlyCosmos (https://glycosmos.org/pathways/show/R-HSA-2022857).
Fig. 5
Fig. 5
Screenshot of the GGDB chemical-gene interactions in PubChem, which lists the glycan acceptors and products for the selected enzyme (glycogene), as well as their corresponding CIDs, and whether the substrate is recognized or not by the given enzyme (https://pubchem.ncbi.nlm.nih.gov/gene/11320#section=Chemical-Gene-Interactions).
Fig. 6
Fig. 6
Screenshot of diseases known to be associated to a given glycogene, as annotated in GDGDB, along with the publication information provided as evidence (https://pubchem.ncbi.nlm.nih.gov/gene/1836#section=GDGDB-Gene-Disease-Associations).
Fig. 7
Fig. 7
Screenshot of the automatically generated description of Disialyllactose displayed at the bottom of the Summary section in PubChem, along with the link to the glycan entry page in GlyCosmos (https://pubchem.ncbi.nlm.nih.gov/compound/45266862). Note that this page provides all of the various synonyms (often used by glycobiologists) by which this structure could be identified.
Fig. 8
Fig. 8
The PubChem taxonomy page for Rattus (https://pubchem.ncbi.nlm.nih.gov/taxonomy/10114#section=Glycans), which includes a section on the glycans that have been reported in the given taxon. This information is derived from GlyCosmos, as evident from the source citation at the bottom, next to the green triangle, which, when opened, is linked to the corresponding GlyCosmos’ organism page.
Fig. 9
Fig. 9
Example workflow of how glycans in GlyCosmos are linked to their glycosyltransferases, which are linked to the PubChem gene page, containing further details about BioAssays, patents, and more. As a result, users are able to effectively research data on glycans, their related genes, and more.
Fig. 10
Fig. 10
Overview of the data integrations of glycans, glycogenes, and diseases in GlyCosmos and PubChem. a) Glycan entry page which includes the glycoenzymes known to be involved in the biosynthesis of the given glycan. The red circle for one of the enzymes B4GALNT1 is linked to b) the Glycogene entry page, which includes known diseases for the given glycogene. The red circle to PubChem is linked to c) the Gene-Disease Co-Occurrences section, which provides more details regarding the listed genes and their publication information.
Fig. 11
Fig. 11
Overview of the data sets that are shared between GlyCosmos and PubChem (overlapping), and those that are unique to each resource. GlyCosmos contains glycan-centric information such as glycome- and glycan-related data, whereas PubChem provides further information on chemical and physical properties.

References

    1. Aoki-Kinoshita KF, Kinjo AR, Morita M, Igarashi Y, Chen Y-A, Shigemoto Y, Fujisawa T, Akune Y, Katoda T, Kokubu A, et al. Implementation of linked data in the life sciences at BioHackathon 2011. J Biomed Semantics. 2015:6(1):3. 10.1186/2041-1480-6-3. - DOI - PMC - PubMed
    1. Aoki-Kinoshita KF, Lisacek F, Mazumder R, York WS, Packer NH. The GlySpace alliance: toward a collaborative global glycoinformatics community. Glycobiology. 2020:30(2):70–71. 10.1093/glycob/cwz078. - DOI - PMC - PubMed
    1. Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2018:46(D1):D649–D655. 10.1093/nar/gkx1132. - DOI - PMC - PubMed
    1. Fu G, Batchelor C, Dumontier M, Hastings J, Willighagen E, Bolton E. PubChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Chem. 2015:7(1):34. 10.1186/s13321-015-0084-4. - DOI - PMC - PubMed
    1. Fujita A, Aoki NP, Shinmachi D, Matsubara M, Tsuchiya S, Shiota M, Ono T, Yamada I, Aoki-Kinoshita KF. The international glycan repository GlyTouCan version 3.0. Nucleic Acids Res. 2021:49(D1):D1529–D1533. 10.1093/NAR/GKAA947. - DOI - PMC - PubMed

Publication types