Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 1;73(Pt 2):171-186.
doi: 10.1107/S2059798316016910. Epub 2017 Feb 1.

Strategies for carbohydrate model building, refinement and validation

Affiliations

Strategies for carbohydrate model building, refinement and validation

Jon Agirre. Acta Crystallogr D Struct Biol. .

Abstract

Sugars are the most stereochemically intricate family of biomolecules and present substantial challenges to anyone trying to understand their nomenclature, reactions or branched structures. Current crystallographic programs provide an abstraction layer allowing inexpert structural biologists to build complete protein or nucleic acid model components automatically either from scratch or with little manual intervention. This is, however, still not generally true for sugars. The need for carbohydrate-specific building and validation tools has been highlighted a number of times in the past, concomitantly with the introduction of a new generation of experimental methods that have been ramping up the production of protein-sugar complexes and glycoproteins for the past decade. While some incipient advances have been made to address these demands, correctly modelling and refining carbohydrates remains a challenge. This article will address many of the typical difficulties that a structural biologist may face when dealing with carbohydrates, with an emphasis on problem solving in the resolution range where X-ray crystallography and cryo-electron microscopy are expected to overlap in the next decade.

Keywords: carbohydrates; conformation; glycosylation; restraints; validation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Deposition rate of glycoproteins and protein–sugar complexes. This graph was produced using the publicly available search functions provided by the RCSB PDB (Bernstein et al., 1977 ▸), restricting each query to crystallographic structures. Structures containing saccharides were selected by the different ‘saccharide’ chem_comp codes (represented by a red line on the graph), and structures containing N-­glycosylation were selected by the ASNND2-NAGC1 LINK record (represented by a blue line); the latter figures do not reflect the total number, as at least 16 structures were found to have incorrect ASNOD1-NAGC1 LINK records. The total numbers of PDB structures per year (grey bars) have been plotted on a 1/10 scale (right axis) to make the 10% proportion stand out.
Figure 2
Figure 2
Interconversions between open-chain and cyclic forms of d-fructose. A furanose ring (on the left) is formed after the 5-hydroxyl (O atom in orange) performs a nucleophilic attack on the ketone (carbonyl containing the O atom in blue). This results in two anomeric configurations (α or β, resulting from the blue O atom lying on the lower or upper side of the ring. respectively), as the ketone C atom is sp 2-hybridized and thus planar, and the attack can be performed from either side of the plane. The same holds true for pyranose-ring formation, except that now it is the 6-hydroxyl (O atom in green) which attacks the ketone. A similar mechanism occurs in aldoses (e.g. d-glucose or d-galactose), where the 4- and 5-hydroxyls attack the aldehyde group in position 1 to form furanose and pyranose rings, respectively. Numbers in gold denote all of the potential positions that substituents can adopt in a pyranose ring (1, up and axial; 2, up and equatorial; 3, down and axial; 4, down and equatorial).
Figure 3
Figure 3
Conformational interconversions. According to IUPAC carbohydrate nomenclature (McNaught, 1997 ▸), the different conformations are identified by an italic capital letter, chair (C), envelope (E), boat (B), skew-boat (S), half-chair (H) and twist (T), with the atoms on the upper or lower side of the main ring plane in superscript and subscript lettering, respectively. Wavy lines identify those atoms that are roughly coplanar (i.e. forming the main plane) in that particular conformation. Here, the different conformations are drawn as a function of the Cremer–Pople puckering parameters (Cremer & Pople, 1975 ▸). (a, b) Pseudo-rotational itinerary for furanoses and possible conformations. Furanoses are able to adopt twist and envelope conformations, with a very small energy barrier separating them. O atoms, which are assumed to be located at the top vertex in the pentagons, have been omitted from this diagram for reasons of clarity. In addition, the diagram does not show the total puckering amplitude (Q). (c, d) Cremer–Pople sphere describing the conformational itineraries for pyranoses and possible conformations. In order to convert the chair conformation of a pyranose ring to a boat conformation, both of which typically sit at energy minima, with the chair being the more favourable, the ring must pass through envelope or half-chair conformations which, having eclipsed substituents and considerable angle strain, require a considerable energetic investment. In context, these energy barriers are usually proportional to the cost of breaking three or four hydrogen bonds in peptides (Sheu et al., 2003; Davies et al., 2012 ▸).
Figure 4
Figure 4
Making linkages. (a) Leaving groups. Leaving groups which abandon the reducing sugar during the linkage reaction are depicted in grey. H atoms have been omitted for reasons of clarity. (b) Linkage nomenclature. A schematic representation of a glycosidic linkage [the simplified monosaccharides are unrelated to those in (a)] is shown. Atoms are referred to by their PDBCCD nomenclature, and those groups responsible for linkage nomenclature have been colour-coded: blue, the configuration of the newly linked O4 (which substitutes O1 from the leaving group) with respect to the absolute stereochemistry as determined by C6 marks the linkage stereochemistry (β); red, the order of the bond (1–4) indicates that the linkage is a glycosidic bond between C1 from the sugar on the left and O4 from the sugar on the right. If the sugar on the left was a ketose, for example d-fructose, the linkage would be signified as β2–4, as the anomeric C atom would be C2 (see Fig. 2 ▸).
Figure 5
Figure 5
Understanding link torsions. In analogy to how the peptide-bond conformation is evaluated in proteins, glycosidic bonds can also be described in terms of torsions. These have been denoted in lowercase Greek letters in order to avoid confusion with the Cremer–Pople parameters (Cremer & Pople, 1975 ▸), and match the nomenclature as reviewed by Lütteke (2009 ▸) and used by the CARP server (Lütteke et al., 2005, 2006 ▸) as well as Privateer (Agirre, Iglesias-Fernández et al., 2015 ▸). Some of these torsion angles are expected to have predictable values as they involve an sp 2-hybridized C atom, e.g. ψN. This figure was generated with CCP4mg (McNicholas et al., 2011 ▸).
Figure 6
Figure 6
Examples of N-linked glycosylation. Top, plant N-glycans typically show α1–3 core-linked fucose and β1–2 xylose linked to the first mannose sugar. In the figure, a diagram of one of the glycans found in a haem peroxidase from sorghum (PDB entry 5aog; Nnamchi et al., 2016 ▸). Middle, a complete, unprocessed high-mannose N-glycan linked to a glycosyl hydrolase enzyme from the fungus Aspergillus fumigatus (PDB entry 5fji; Agirre et al., 2016 ▸). Bottom, a sialylated N-glycan linked to an Fc fragment from a human antibody (PDB entry 4byh; Crispin et al. (2013 ▸). Human glycans, and also mammalian glycans in general, may display an α1–6 core-linked fucose. All diagrams and legends were generated with Privateer (Agirre, Iglesias-Fernández et al., 2015 ▸). For more examples of glycans, refer to the complete overview of N-glycan structures published by Stanley & Cummings (2009 ▸).
Figure 7
Figure 7
Idealized and example coordinates for the PDBCCD entry IDS (2-O-sulfo α-l-iduronic acid) and their comparison with a minimal energy conformer calculated by torsional exploration and minimization with RDKit. The blue area denotes those atoms which lie roughly in a plane, making it easier to identify the ring conformation. Top, the biologically relevant 1 C 4 conformer, as stored in the PDBCCD idealized coordinates. Despite showing repulsion between axial substituents, this chair conformation is the only feasible conformation, as converting it into the slightly more favourable 4 C 1 chair would require a considerable energetic investment. Middle, example coordinates as determined by NMR (Mulloy et al., 1993 ▸). This conformer is in a high-energy conformation and does not match any of the available high-resolution crystallographic structures. Bottom, a 4 C 1 chair conformer obtained by torsional exploration with RDKit (Landrum, 2016 ▸). The aforementioned energy barrier is artificially circumvented by exploring different combinations of torsions. This is the absolute minimal energy conformation, but one that is not attainable without external intervention. This figure was generated with CCP4mg (McNicholas et al., 2011 ▸).
Figure 8
Figure 8
Generating a dictionary for α-d-glucopyranose from a SMILES string. Bond and angle geometries have been colour-coded according to the top-right inset panel. Horizontal lines represent deviations from Engh & Huber (1991 ▸). The three methods showing the closest agreement are shown in bold: ACEDRG, grade using Mogul, and eLBOW using Mogul. Red asterisks: PRODRG and eLBOW using the AM1 method did not obtain the lowest-energy conformation (4 C 1 for d-glucose) as starting coordinates, and PRODRG produced the incorrect absolute configuration, turning d-glucose into its C5-epimer l-idose. Mogul (Bruno et al., 2004 ▸) is the current geometric target that the PDB are using as validation for hetero compounds.
Figure 9
Figure 9
Glycosidic bonds, distortion in the −1 subsite and mutarotation at the reducing end. The figure shows the active site of an α-mannanase enzyme reported by Thompson et al. (2015, 2016 ▸), which was crystallized in complex with α1–6-mannopentaose. Sugars have been numbered according to standard practice, from 500A (and its alternate configuration, 500B) at the reducing end to 504 (not shown) at the nonreducing end. LINK records can be defined as shown in the inset (only the part relevant to residue identification is shown; see the PDB format specification for the full syntax) and have to be replicated to link both configurations of residue 500, which in turn have their respective occupancies reduced to 0.5. The sugar in the −1 subsite (nomenclature defined in Davies et al., 1997 ▸) is distorted by the catalytic residues (not shown) to a B 2,5 conformation, which is well supported by clear electron density and described by QM/MM metadynamics simulations as part of the catalytic itinerary (Thompson et al., 2015, 2016 ▸). This figure was generated with CCP4mg (McNicholas et al., 2011 ▸).
Figure 10
Figure 10
Conformational validation. (a) Chemical errors in the key TM9 sugar, deposited as an N-acetyl α-l-mannosamine derivative (left, PDB entry 4k3t, now superseded by PDB entry 5awv), and their impact on the published structure (right). (b) Correct stereochemistry (left) and re-refined structure after correcting the errors (right). Re-refining the structure with the correct stereochemistry (N-acetyl β-d-glucosamine derivative) causes the sugars to end up in the minimal energy chair conformation. For the stereochemically correct ligand, OMIT density maps (mF oDF c coefficients, contoured at 2σ) show plausible density for the putative diol intermediate at least in chains M and N. While the maps selected by the original authors may not be too different from those obtained through refinement of the correct chemical species at the C6 diol, publishing a distorted sugar with the wrong stereochemistry at almost every position casts legitimate doubt on their glyco-chemical conclusions. This figure was generated with CCP4mg (McNicholas et al., 2011 ▸).
Figure 11
Figure 11
Glycosidic bond torsions can be affected by stacking interactions. (a) The most frequent conformation of the GlcNAc–Asn bond as found by Imberty & Perez (1995 ▸) and Lütteke et al. (2005 ▸), plotted as blue stars in (c) for PDB entry 5fji. (b) This flipped conformation of GlcNAc lies in a secondary torsional energy minimum that was originally described by Imberty & Perez (1995 ▸), and is stabilized by a stacking interaction with a neighbouring tryptophan, the character of which is conserved across homologues in order to maintain the conformation of this bond (Agirre et al., 2016 ▸). Stacking interactions can be computed with Privateer (Agirre, Iglesias-Fernández et al., 2015 ▸), using the definition proposed by Hudson et al. (2015 ▸), which states that δ must be shorter than 4.0 Å and the Ω angle must be smaller than 30°. (c) Ramachandran-like plot calculated with Privateer using the convention from Lütteke (2009 ▸), also depicted here in Fig. 5 ▸. This figure was generated with CCP4mg (McNicholas et al., 2011 ▸).

References

    1. Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221.
    1. Adams, P. D. et al. (2011). Methods, 55, 94–106.
    1. Adams, P. D. et al. (2016). Structure, 24, 502–508. - PMC - PubMed
    1. Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367. - PMC - PubMed
    1. Agirre, J., Ariza, A., Offen, W. A., Turkenburg, J. P., Roberts, S. M., McNicholas, S., Harris, P. V., McBrayer, B., Dohnalek, J., Cowtan, K. D., Davies, G. J. & Wilson, K. S. (2016). Acta Cryst. D72, 254–265. - PMC - PubMed

Publication types