Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 15;34(20):3511-3518.
doi: 10.1093/bioinformatics/bty397.

Application of network smoothing to glycan LC-MS profiling

Affiliations

Application of network smoothing to glycan LC-MS profiling

Joshua Klein et al. Bioinformatics. .

Abstract

Motivation: Glycosylation is one of the most heterogeneous and complex protein post-translational modifications. Liquid chromatography coupled mass spectrometry (LC-MS) is a common high throughput method for analyzing complex biological samples. Accurate study of glycans require high resolution mass spectrometry. Mass spectrometry data contains intricate sub-structures that encode mass and abundance, requiring several transformations before it can be used to identify biological molecules, requiring automated tools to analyze samples in a high throughput setting. Existing tools for interpreting the resulting data do not take into account related glycans when evaluating individual observations, limiting their sensitivity.

Results: We developed an algorithm for assigning glycan compositions from LC-MS data by exploring biosynthetic network relationships among glycans. Our algorithm optimizes a set of likelihood scoring functions based on glycan chemical properties but uses network Laplacian regularization and optionally prior information about expected glycan families to smooth the likelihood and thus achieve a consistent and more representative solution. Our method was able to identify as many, or more glycan compositions compared to previous approaches, and demonstrated greater sensitivity with regularization. Our network definition was tailored to N-glycans but the method may be applied to glycomics data from other glycan families like O-glycans or heparan sulfate where the relationships between compositions can be expressed as a graph.

Availability and implementation built executable: http://www.bumc.bu.edu/msr/glycresoft/ and Source Code: https://github.com/BostonUniversityCBMS/glycresoft.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Chromatogram Assignments and Quantification for 20141103-02-Phil-BS Using the Combinatorial + Sulfate database. The Retention Time (Min) axis shows the experimental retention time in minutes, and the Relative Abundance axis shows the intensity of the signal from each aggregated ion species. The identified glycan compositions are labeled with a tuple describing the number of each component of the form [HexNAc, Hex, Fuc, NeuAc, SO3] (Color version of this figure is available at Bioinformatics online.)
Fig. 2.
Fig. 2.
Performance Comparison with and without Network Smoothing for 20141103-02-Phil-BS. The Receiver Operator Characteristic Curve (ROC) comparing True Positive Rate (TPR) to False Positive Rate (FPR) shows how each database performed under different regularization conditions, summarized with the Area Under the Curve (AUC) in the legend. The Combinatorial + Sulfate database showed the best performance, and improved with regularization (Color version of this figure is available at Bioinformatics online.)
Fig. 3.
Fig. 3.
Performance Comparison with and without Network Smoothing for Perm-BS-070111-04-Serum. The Receiver Operator Characteristic Curve (ROC) comparing True Positive Rate (TPR) to False Positive Rate (FPR) shows how each database performed under different regularization conditions, summarized with the Area Under the Curve (AUC) in the legend (Color version of this figure is available at Bioinformatics online.)
Fig. 4.
Fig. 4.
Chromatogram Assignments for Perm-BS-070111-04-Serum. In all panels, the Retention Time (Min) axis shows the experimental retention time in minutes, and the Relative Abundance axis shows the intensity of the signal from each aggregated ion species. The identified glycan compositions are labeled with a tuple describing the number of each component of the form [HexNAc, Hex, Fuc, NeuAc]. (a) Features Assigned After Grid Regularization of Perm-BS-070111-04-Serum. (b) This sample contains heavy ammonium adduction which introduces ambiguity in intact mass based assignments. (c) Low scoring features which may be discarded based on individual evidence alone may be more reasonable to accept given evidence from related composition, such as our network smoothing method (Color version of this figure is available at Bioinformatics online.)

References

    1. Akune Y. et al. (2016) Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: a theoretical N-glycan structure database. Carbohydrate Res., 431, 56–63. - PubMed
    1. Aoki-Kinoshita K. et al. (2015) GlyTouCan 1.0 – the international glycan structure repository. Nucleic Acids Res., 44, D1237–D1242. - PMC - PubMed
    1. Belkin M. et al. (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res., 7, 2399–2434.
    1. Campbell M.P. et al. (2014) UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res., 42, D215–D221. - PMC - PubMed
    1. Ceroni A. et al. (2008) GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans. J. Proteome Res., 7, 1650–1659. - PubMed

Publication types

Substances

LinkOut - more resources