Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Nov:60:108008.
doi: 10.1016/j.biotechadv.2022.108008. Epub 2022 Jun 20.

Artificial intelligence in the analysis of glycosylation data

Affiliations
Review

Artificial intelligence in the analysis of glycosylation data

Haining Li et al. Biotechnol Adv. 2022 Nov.

Abstract

Glycans are complex, yet ubiquitous across biological systems. They are involved in diverse essential organismal functions. Aberrant glycosylation may lead to disease development, such as cancer, autoimmune diseases, and inflammatory diseases. Glycans, both normal and aberrant, are synthesized using extensive glycosylation machinery, and understanding this machinery can provide invaluable insights for diagnosis, prognosis, and treatment of various diseases. Increasing amounts of glycomics data are being generated thanks to advances in glycoanalytics technologies, but to maximize the value of such data, innovations are needed for analyzing and interpreting large-scale glycomics data. Artificial intelligence (AI) provides a powerful analysis toolbox in many scientific fields, and here we review state-of-the-art AI approaches on glycosylation analysis. We further discuss how models can be analyzed to gain mechanistic insights into glycosylation machinery and how the machinery shapes glycans under different scenarios. Finally, we propose how to leverage the gained knowledge for developing predictive AI-based models of glycosylation. Thus, guiding future research of AI-based glycosylation model development will provide valuable insights into glycosylation and glycan machinery.

Keywords: Artificial intelligence; Glycosylation machinery; Interpretable models; Multi-omics integration.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.. Involvement of AI models in glycobiology for basic science and medicine.
(Top) We highlight here five major applications of AI models in glycomics data analysis: 1) to understand glycosylation phenotypes (e.g., glycan structure, location site, and occupancy), 2) to understand glycan-linked disease pathogenesis, 3) to decipher complex glycosylation machinery, 4) to develop therapies to offset effects of aberrant glycosylation, and 5) to enhance current glycomics tools. (Bottom) Current challenges and potential solutions in analyzing glycomics data using AI-based methods. There are three major challenges for developing AI-based methods in analyzing glycomics data: 1) data complexity, 2) model opacity, and 3) data sparsity. However, for each challenge, there are methods that could be used for addressing this challenge (right).
Fig. 2.
Fig. 2.. Explainable AI models – state-of-the-art methods for interpreting the ‘black-box’ model.
(A) Hybrid Mechanistic and AI Models. Mechanistic models are interpretable models that mathematically describe a system, but these approaches require extensive effort for constructing the models and parameterization of the mathematical equations. A hybrid model employs a mechanistic framework but coupled with AI methods for predicting parameters of the equations for describing kinetic or biological mechanisms. (B) Direct Algorithmic Interpretation of the AI Models. There are two major types of methods for opening the black box of AI models–Intrinsic Interpretable methods and Post-Hoc Explanation methods (Gunning et al., 2019; Pour, 2021). More specifically, this figure illustrates the circumstance for choosing an appropriate method for interpreting an AI model.
Fig. 3.
Fig. 3.. Schematic of how unmeasured glycans can be included in AI models.
(A) Glycomic data were collected under different conditions. The glycan decorated proteins can be collected from different conditions, and then glycomics datasets will be generated by profiling the samples from these conditions. (B) Additional features for building AI models to predict unseen glycans. We can combine additional information such as glycan substructures or biosynthetic steps shared between different glycans to train a comprehensive AI model for increasing the number of predicted glycans.

Similar articles

Cited by

References

    1. Aizpurua-Olaizola O, Sastre Toraño J, Falcon-Perez JM, Williams C, Reichardt N, & Boons G-J, 2018. Mass spectrometry for glycan biomarker discovery. Trends Analyt. Chem. 100, 7–14. 10.1016/j.trac.2017.12.015. - DOI
    1. Akmal MA, Rasool N, & Khan YD, 2017. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PlOS One. 12(8), e0181966. 10.1371/journal.pone.0181966. - DOI - PMC - PubMed
    1. Antonakoudis A, Strain B, Barbosa R, Jimenez del Val I, & Kontoravdi C, 2021. Synergising stoichiometric modelling with artificial neural networks to predict antibody glycosylation patterns in Chinese hamster ovary cells. Comput. Chem. Eng. 154, 107471. 10.1016/j.compchemeng.2021.107471. - DOI
    1. Bao B, Kellman BP, Chiang AWT, Zhang Y, Sorrentino JT, York AK, Mohammad MA, Haymond MW, Bode L, & Lewis NE, 2021. Correcting for sparsity and interdependence in glycomics by accounting for glycan biosynthesis. Nat. Commun. 12(1), 4988. 10.1038/s41467-021-25183-5. - DOI - PMC - PubMed
    1. Bavafaye Haghighi E, Knudsen M, Elmedal Laursen B, & Besenbacher S, 2019. Hierarchical Classification of Cancers of Unknown Primary Using Multi-Omics Data. Cancer Inform. 18, 1176935119872163. 10.1177/1176935119872163. - DOI - PMC - PubMed

Publication types

Substances