Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov;21(11):2117-2127.
doi: 10.1038/s41592-024-02464-7. Epub 2024 Oct 14.

Restoring protein glycosylation with GlycoShape

Affiliations

Restoring protein glycosylation with GlycoShape

Callum M Ives et al. Nat Methods. 2024 Nov.

Abstract

Despite ground-breaking innovations in experimental structural biology and protein structure prediction techniques, capturing the structure of the glycans that functionalize proteins remains a challenge. Here we introduce GlycoShape ( https://glycoshape.org ), an open-access glycan structure database and toolbox designed to restore glycoproteins to their native and functional form in seconds. The GlycoShape database counts over 500 unique glycans so far, covering the human glycome and augmented by elements from a wide range of organisms, obtained from 1 ms of cumulative sampling from molecular dynamics simulations. These structures can be linked to proteins with a robust algorithm named Re-Glyco, directly compatible with structural data in open-access repositories, such as the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) and AlphaFold Protein Structure Database, or own. The quality, performance and broad applicability of GlycoShape is demonstrated by its ability to predict N-glycosylation occupancy, scoring a 93% agreement with experiment, based on screening all proteins in the PDB with a corresponding glycoproteomics profile, for a total of 4,259 N-glycosylation sequons.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic representation of the GlycoShape workflow (https://glycoshape.org).
Top: the GlycoShape GDB is a repository of glycan 3D structures from 1 ms cumulative sampling through uncorrelated replicas of deterministic MD simulations. Structures can be searched by drawing a structure according to the Symbol Nomenclature for Glycans (SNFG) with the integrated SugarDrawer tool or by searching by text in the International Union of Pure and Applied Chemistry (IUPAC), Web3 Unique Representation of Carbohydrate Structures (WURCS), Simplified Molecular Input Line Entry System (SMILES), GLYCAM and GlyTouCan formats. The GlyTouCan ID of the tetra-antennary complex N-glycan is shown in the example in blue. The successful search outputs general information on the glycan in the main tab and its 3D structure in the ‘Structure’ tab resulting from clustering analysis of the MD data with the populations (weights) corresponding to each cluster. This information can be downloaded in PDB, CHARMM and GLYCAM formats. Bottom right: Re-Glyco allows users to rebuild the 3D structures of glycoproteins to the desired glycoforms by sourcing 3D glycan structures from the GDB and to predict N-glycosylation sites occupancy through the bespoke GlcNAc Scanning tool. As an example shown here and discussed in Results, the CD16b from PDB 6EAQ (in blue) is processed through the GlcNAc Scanning tool, which correctly predicts occupancy of the sequons at N45 and N16251. These sites can be N-glycosylated with a ‘one-shot’ approach, where the same N-glycan structure is chosen to occupy all, as shown in the example, or manually with a site-by-site approach, where the user can select a different N-glycan structure at each site.
Fig. 2
Fig. 2. Schematic overview of GAP used to build the GlycoShape GDB.
a, Multiple uncorrelated replica MD simulations are performed for each glycan in the GDB, to comprehensively sample its structural dynamics. b,c, The resulting MD frames are then transformed into a graph matrix representation (b), simplified by flattening the lower half as a mono-dimensional array (c). d, This step enables a dimensionality reduction via PCA. e, These data are clustered by GMM, the results of which are displayed in terms of cluster distributions. f, Representative 3D structures for each cluster are selected on the basis of KDE maxima, along with comprehensive torsion angle profiles for the highest-populated clusters, showing the wide breadth of the conformational space covered by GAP. g, Structures derived from GAP are clearly presented on the GlycoShape GDB web platform, in addition to biological and chemical information.
Fig. 3
Fig. 3. Schematic overview of the Re-Glyco algorithm used to select and link glycan 3D structures from the GDB to a protein.
a, The definition of the φ and ψ torsion angles, with corresponding atoms labeled ‘a’ to ‘e’, determining the conformation of the linkage between the protein (P shown in gray) sidechain and the reducing end of the glycan (G shown in yellow). b, Heat maps showing preferential conformation of the φ, ψ torsions between Asn-b-GlcNAc and Thr-a-GalNAc, with energy minima highlighted within red rectangles. c, Two-dimensional SNFG structure and (below) 3D structures of the tetra-antennary fully a3-sialylated N-glycan from the clustering analysis shown in Fig. 1. d, A schematic representation of the Re-Glyco workflow applied to the reconstruction of human interleukin-5 (IL5; Uniprot P05113). In agreement with the annotation, GlcNAc Scanning identifies only the N47 sequon as potentially occupied. Accordingly, N47 can be functionalized with more elaborate structures through a ‘one-shot’ glycosylation, where also T22 can be functionalized with a sialylated core1 O-glycan. Highly complex glycosylation at N47 and alternative O-glycosylation structures can be selected by sourcing directly from the GDB through the Advanced (Site-by-Site) Glycosylation tool, as shown on the alternative IL5 glycoform on the righthand side. Molecular rendering with Mol* Viewer; statistical analysis and heat maps created with matplotlib (https://matplotlib.org/).
Fig. 4
Fig. 4. Reglycosylation of CD16b.
a, The structure of the CD16b (PDB 1E4J, resolution 2.50 Å) with N-glycosylation sequons labeled. The bold labels indicate occupied sequons in neutrophil-bound CD16b. The green check marks indicate the sequons predicted to be occupied by the Re-Glyco GlcNAc-Scanning tool, while the red cross marks indicate sequons that Re-Glyco deems unoccupied. b, The structural alignment of the CD16b (PDB 1E4J) in cyan with the CD16b (PDB 6EAQ, resolution 2.22 Å) in magenta. Both sequons occupied in PDB 6EAQ, namely, N45 and N162, are correctly predicted as occupied by Re-Glyco. c, The structure of the CD16b (PDB 1E4J) modified by swapping OD1 and ND2 coordinates and alternative rotameric orientation of the N74 sidechain N-glycosylated by Re-Glyco with a different selection of N-glycan structures from the GlycoShape GDB at each site. d, A close-up view of the OD1 and ND2 orientation of N45 in the CD16b from PDB 1E4J (cyan) and from PDB 6EAQ (magenta). e, A close-up of the orientation of the N74 sidechain in the CD16b from PDB 1E4J. The distance between the CG of N45 and the CA of F33 is 4.7 Å. Rendering of the 3D structures in a, b, d and e and rotamer search performed with pymol (https://pymol.org/2/). Rendering of 3D structure in c with VMD (https://www.ks.uiuc.edu/Research/vmd/) and N-glycan 2D structures with DrawGlycan-SNFG (http://www.virtualglycome.org/DrawGlycan/).
Fig. 5
Fig. 5. Performance of Re-Glyco on AlphaFold structures.
a, A histogram analysis of the distribution of pLLDT scores of residues clashing during the GlcNAc Scanning of 3,415 proteins from the AlphaFold Protein Structure Database, with a total of 12,789 glycosylation sites annotated in Uniprot. The distribution of pLDDT values for all residues in the protein tested is shown with a dashed line. The distribution of the pLDDT values for the residues clashing with the GlcNAc during GlcNAc Scanning, where clashes were resolved by Re-Glyco is shown with powder blue histograms and a blue line. The distribution of the pLDDT values for the residues where clashing was not resolved by Re-Glyco and for the residues in the immediate vicinity (±2) is shown with rose histograms and a red line. b, The 3D structure of the EXTL3 monomer from PDB 8OG1 (green cartoons) represented within the homodimer from PDB 7AU2 (white surface). The resolution for each structure is shown in the labels. Asn residues within sequons are shown with van der Waals (vdW) spheres, where N atoms are in blue and O atoms are in red. The N-glycosylation sequons known to be occupied are shown in bold. A green check mark indicates that the site is predicted to be occupied by Re-Glyco, while a red cross mark indicates that the site is predicted to be unoccupied by Re-Glyco owing to major steric clashing. c, Bottom: the 3D structure of the EXTL3 monomer from AlphaFold (AF-O43909-F1) shown in cyan. The lowest-confidence loops and termini are removed from the image for clarity. The all-atom RMSD versus PDB 8OG1 is shown in the label. Top: a close-up view of the sidechain orientation of Asn 592, where the clash with a spatially neighboring loop prevents functionalization. d, Bottom: the 3D structure of the EXTL3 monomer from ColabFold (Rank 001) shown in orange. Top: a close-up view of the sidechain orientation of Asn 592, showing the alternative orientation that allows for functionalization. Molecular representation with pymol (https://pymol.org/2), and statistical analysis and rendering created with matplotlib (https://matplotlib.org/).

References

    1. Schjoldager, K. T., Narimatsu, Y., Joshi, H. J. & Clausen, H. Global view of human protein glycosylation pathways and functions. Nat. Rev. Mol. Cell Biol.21, 729–749 (2020). - PubMed
    1. Moremen, K. W., Tiemeyer, M. & Nairn, A. V. Vertebrate protein glycosylation: diversity, synthesis and function. Nat. Rev. Mol. Cell Biol.13, 448–462 (2012). - PMC - PubMed
    1. Stanley, P., Moremen, K. W., Lewis, N. E., Taniguchi, N. & Aebi, M. in Essentials of Glycobiology (eds Varki, A. et al.) Ch. 9 (Cold Spring Harbor Laboratory Press, 2022). - PubMed
    1. Hutter, H. et al. Conservation and novelty in the evolution of cell adhesion and extracellular matrix genes. Science287, 989–994 (2000). - PubMed
    1. Bloch, J. S. et al. Structure, sequon recognition and mechanism of tryptophan C-mannosyltransferase. Nat. Chem. Biol.19, 575–584 (2023). - PMC - PubMed

LinkOut - more resources