Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 28;96(21):8332-8341.
doi: 10.1021/acs.analchem.3c04992. Epub 2024 May 8.

Boltzmann Model Predicts Glycan Structures from Lectin Binding

Affiliations

Boltzmann Model Predicts Glycan Structures from Lectin Binding

Aria Yom et al. Anal Chem. .

Abstract

Glycans are complex oligosaccharides that are involved in many diseases and biological processes. Unfortunately, current methods for determining glycan composition and structure (glycan sequencing) are laborious and require a high level of expertise. Here, we assess the feasibility of sequencing glycans based on their lectin binding fingerprints. By training a Boltzmann model on lectin binding data, we predict the approximate structures of 88 ± 7% of N-glycans and 87 ± 13% of O-glycans in our test set. We show that our model generalizes well to the pharmaceutically relevant case of Chinese hamster ovary (CHO) cell glycans. We also analyze the motif specificity of a wide array of lectins and identify the most and least predictive lectins and glycan features. These results could help streamline glycoprotein research and be of use to anyone using lectins for glycobiology.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest disclosure

N.E.L. and A.W.T.C. have submitted patents associated with the use of lectin binding patterns for determining glycan structures and disease diagnostics. N.E.L. is also co-founder and holds financial interest in NeuImmune and Augment Biologics, which focus on glycoprotein therapeutics.

Figures

Figure 1:
Figure 1:
Boltzmann model
Figure 2:
Figure 2:
Amount of information extracted about each motif by our first and second order models, as quantified by the difference in cross entropy ΔH. For each motif, the models are optimized over the two most predictive lectins. For the task of predicting any single motif, there is little to be gained from using the higher order model. See Methods - Cross entropy for more details.
Figure 3:
Figure 3:
Conceptual illustration of how prediction occurs on five slightly different glycans. The model associates each lectin with certain features. The model then finds glycans that possess these features. The model also learns to ignore certain features at times, like the WGA peak in the center glycan. For each lectin, several concentrations were used, producing different degrees of binding, though only the lectin code is labelled along the x-axis.
Figure 4:
Figure 4:
Test accuracy on N-linked and O-linked CFG glycans. Results averaged over 64 training/test set partitions. Error bars represent the standard error. Note that when the problematic spacer Sp14 is removed, O-glycan accuracy is much higher. A random sample of correct and predicted glycan structures may be found in Figure 5.
Figure 5:
Figure 5:
Random sample of N and O glycans, along with our model’s predictions based on their respective lectin profiles. The correct glycan is boxed. Note that even when the correct glycan is not the model’s first pick, it is often in the top three predictions. Also, the predicted glycans are typically very similar to the correct glycan, particularly in terms of their terminal motifs.
Figure 6:
Figure 6:
Using the prediction results from 64 runs of our algorithm with random training/test partitions, we extract three groups of glycans based on their ranks: high accuracy (rank 1), medium accuracy (ranks 3 and 4), and low accuracy (ranks > 6). (a) Representative motifs enriched for each group, based on lowest binomial test p-value. No statistically significant motifs were found for the low accuracy O-glycans due to small sample size. (b) Random sample glycans from each group. (c) Overlapping histograms of prediction ranks for tested N and O glycans (blue and yellow, respectively, with brown representing overlapping bars of the histograms). A random sample of correct and predicted glycan structures may be found in Figure 5.
Figure 7:
Figure 7:
Training and test partitions of CFG and CHOGlycoNET glycans. n denotes the number of glycans in each set.
Figure 8:
Figure 8:
Ranks returned for each correct CHO N-glycan alongside the accuracy for general CFG N- and O-glycans. The CFG data are taken from the experiments of the previous section, where an 80/20 train/test partition was used. Bernoulli σ is used for the CHO error bars, and standard deviation is used for the CFG error bars. Note that the CHO results closely match the CFG N-glycan results, implying that the model generalizes well.
Figure 9:
Figure 9:
Mutual information between lectins and motifs. Only pairs with ≥ .3 bits of information are displayed. Each box on the right corresponds to a motif, but to save space only the highest correlation motif is displayed for each lectin.
Figure 10:
Figure 10:
Mutual information between lectins and their best-binding motifs. On average, 82% of a lectin’s captured information comes from its primary binding motif. This increases to 86% and 89% when restricting to N and O glycans respectively, due to increased specificity.

Update of

Similar articles

Cited by

References

    1. Ho W-L; Hsu W-M; Huang M-C; Kadomatsu K; Nakagawara A Protein glycosylation in cancers and its potential therapeutic applications in neuroblastoma. J Hematol Oncol 2016, 9, 100. - PMC - PubMed
    1. Cummings RD; Pierce JM The Challenge and Promise of Glycomics. Chemistry and Biology 2014, 21, 1–15. - PMC - PubMed
    1. Freeze HH Understanding Human Glycosylation Disorders: Biochemistry Leads the Charge*. Journal of Biological Chemistry 2013, 288, 6936–6945. - PMC - PubMed
    1. Hennet T Diseases of glycosylation beyond classical congenital disorders of glycosylation. Biochimica et Biophysica Acta (BBA) - General Subjects 2012, 1820, 1306–1317, Glycoproteomics. - PubMed
    1. Ju T; Wang Y; Aryal R; Lehoux S; Ding X; Kudelka M; Cutler C; Zeng J; Wang J; Sun X; Heimburg-Molinaro J; Smith D; Cummings R Tn and sialyl-Tn antigens, aberrant O-glycomics as human disease markers. Proteomics Clin Appl 2013, 7(9–10), 618–31, - PMC - PubMed

LinkOut - more resources