Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 4;34(10):2127-2135.
doi: 10.1021/jasms.3c00132. Epub 2023 Aug 24.

Toward Automatic Inference of Glycan Linkages Using MSn and Machine Learning─Proof of Concept Using Sialic Acid Linkages

Affiliations

Toward Automatic Inference of Glycan Linkages Using MSn and Machine Learning─Proof of Concept Using Sialic Acid Linkages

Xinyi Ni et al. J Am Soc Mass Spectrom. .

Abstract

Glycosidic linkages in oligosaccharides play essential roles in determining their chemical properties and biological activities. MSn has been widely used to infer glycosidic linkages but requires a substantial amount of starting material, which limits its application. In addition, there is a lack of rigorous research on what MSn protocols are proper for characterizing glycosidic linkages. In this work, to deliver high-quality experimental data and analysis results, we propose a machine learning-based framework to establish appropriate MSn protocols and build effective data analysis methods. We demonstrate the proof-of-principle by applying our approach to elucidate sialic acid linkages (α2'-3' and α2'-6') in a set of sialyllactose standards and NIST sialic acid-containing N-glycans as well as identify several protocol configurations for producing high-quality experimental data. Our companion data analysis method achieves nearly 100% accuracy in classifying α2'-3' vs α2'-6' using MS5, MS4, MS3, or even MS2 spectra alone. The ability to determine glycosidic linkages using MS2 or MS3 is significant as it requires substantially less sample, enabling linkage analysis for quantity-limited natural glycans and synthesized materials, as well as shortens the overall experimental time. MS2 is also more amenable than MS3/4/5 to automation when coupled to direct infusion or LC-MS. Additionally, our method can predict the ratio of α2'-3' and α2'-6' in a mixture with 8.6% RMSE (root-mean-square error) across data sets using MS5 spectra. We anticipate that our framework will be generally applicable to analysis of other glycosidic linkages.

Keywords: machine learning; mass spectrometry; non-negative matrix factorization; support vector machine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
A machine learning-based framework for automatic inference of linkages using MSn. An MSn spectrum data set is collected by using glycans of known structures that contain various glycosidic linkages. (A) In the training phase, data preprocessing methods are developed to normalize raw MSn spectra, detect outliers, and so on. Then, machine learning models are developed to learn representations of spectra, analyze linkage, and identify appropriate protocols for generating high-quality and informative spectra. (B) In the test phase, test spectra are first preprocessed by the data preprocessing component established in the training phase and then are fed to the machine learning models trained in the training phase to produce linkage analysis results. It is recommended that the test spectra should be generated using the MSn protocol(s) identified in the training phase.
Figure 2
Figure 2
Sialic acid linkage determination (α2′–3′ vs α2′–6′) using MS5. At each of the MS2/3/4 levels, a fragment m/z (marked by an asterisk (*)) is manually decided as a precursor of the next MS level. All spectra use lithium as the ion adduct. The detection of diagnostic signatures for differentiating α2′–3′ and α2′–6′ occurs at the MS5 level. The fragments corresponding to the α2′–3′ and α2′–6′ MS5 signatures (i.e., F1, F2, F3, and F4) are shown in the box.
Figure 3
Figure 3
Implementing the framework. This diagram illustrates our implementation of the proposed framework. See the main text for detailed explanations. The training phase produces the basis spectra and linkage classifier/regressor that are applied to linkage analysis in the test phase.
Figure 4
Figure 4
MS5 basis spectra learned by NMF. The NIST data set is used. Two basis spectra are learned, and they align perfectly with the theoretical peaks of α2′–3′ and α2′–6′. The top basis spectrum contains the conventional α2′–3′ diagnostic signatures (m/z = 103.08 and 131.07), and the bottom basis spectrum contains the conventional α2′–6′ diagnostic signatures (m/z = 89.06, 117.06).
Figure 5
Figure 5
T-SNE visualization of the NMF-based representation learning results on the MS2 spectra in the NIST data set. The α2′–3′ linkages cluster together in the top-left corner, while the α2′–6′ linkages cluster together in the bottom-right part.
Figure 6
Figure 6
Average spectrum quality in the NIST and UGA-R data sets. The quality (the higher the better) of each configuration is the average quality of the MS2/3/4/5 spectra in the NIST and UGA-R data sets.
Figure 7
Figure 7
Relative linkage abundance inference results (RMSE, stratified by the configuration) on the NIST data set. A model was trained on the spectra collected under each configuration in the UGA-S data set and was then tested on the NIST data set.
Figure 8
Figure 8
Outlier detection. The quality score distribution of the MS5 spectra in the NIST, UGA-S, and UGA-R data sets.

Similar articles

References

    1. Varki A. Colloquium paper: uniquely human evolution of sialic acid genetics and biology. Proc. Natl. Acad. Sci. U.S.A. 2010, 107 (Suppl 2), 8939–8946. 10.1073/pnas.0914634107. - DOI - PMC - PubMed
    1. Ma X.; Li Y.; Kondo Y.; Shi H.; Han J.; Jiang Y.; Bai X.; Archer-Hartmann S. A.; Azadi P.; Ruan C.; Fu J.; Xia L. Slc35a1 deficiency causes thrombocytopenia due to impaired megakaryocytopoiesis and excessive platelet clearance in the liver. Haematologica 2021, 106 (3), 759–769. 10.3324/haematol.2019.225987. - DOI - PMC - PubMed
    1. Shajahan A.; Supekar N. T.; Wu H.; Wands A. M.; Bhat G.; Kalimurthy A.; Matsubara M.; Ranzinger R.; Kohler J. J.; Azadi P. Mass Spectrometric Method for the Unambiguous Profiling of Cellular Dynamic Glycosylation. ACS Chem. Biol. 2020, 15 (10), 2692–2701. 10.1021/acschembio.0c00453. - DOI - PMC - PubMed
    1. de Haan N.; Yang S.; Cipollo J.; Wuhrer M. Glycomics studies using sialic acid derivatization and mass spectrometry. Nature Reviews Chemistry 2020, 4, 229–242. 10.1038/s41570-020-0174-3. - DOI - PubMed
    1. Shajahan A.; Supekar N. T.; Chapla D.; Heiss C.; Moremen K. W.; Azadi P. Simplifying Glycan Profiling through a High-Throughput Micropermethylation Strategy. SLAS Technol. 2020, 25 (4), 367–379. 10.1177/2472630320912929. - DOI - PMC - PubMed

MeSH terms