. 2014 Sep;13(9):2490-502.

doi: 10.1074/mcp.M114.039560. Epub 2014 Jun 12.

A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra

Han Hu¹, Yu Huang², Yang Mao², Xiang Yu², Yongmei Xu³, Jian Liu³, Chengli Zong⁴, Geert-Jan Boons⁴, Cheng Lin², Yu Xia⁵, Joseph Zaia⁶

Affiliations

¹ From the ‡Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA; §Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA;
² §Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA;
³ ¶ Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA;
⁴ **Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602.
⁵ ‖Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, Quebec H3A 0C3, Canada; From the ‡Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA;
⁶ §Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA; jzaia@bu.edu.

PMID: 24925905
PMCID: PMC4159664
DOI: 10.1074/mcp.M114.039560

A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra

Han Hu et al. Mol Cell Proteomics. 2014 Sep.

. 2014 Sep;13(9):2490-502.

doi: 10.1074/mcp.M114.039560. Epub 2014 Jun 12.

Authors

Han Hu¹, Yu Huang², Yang Mao², Xiang Yu², Yongmei Xu³, Jian Liu³, Chengli Zong⁴, Geert-Jan Boons⁴, Cheng Lin², Yu Xia⁵, Joseph Zaia⁶

Affiliations

¹ From the ‡Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA; §Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA;
² §Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA;
³ ¶ Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA;
⁴ **Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602.
⁵ ‖Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, Quebec H3A 0C3, Canada; From the ‡Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA;
⁶ §Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA; jzaia@bu.edu.

PMID: 24925905
PMCID: PMC4159664
DOI: 10.1074/mcp.M114.039560

Abstract

Heparan sulfate (HS) is a linear polysaccharide expressed on cell surfaces, in extracellular matrices and cellular granules in metazoan cells. Through non-covalent binding to growth factors, morphogens, chemokines, and other protein families, HS is involved in all multicellular physiological activities. Its biological activities depend on the fine structures of its protein-binding domains, the determination of which remains a daunting task. Methods have advanced to the point that mass spectra with information-rich product ions may be produced on purified HS saccharides. However, the interpretation of these complex product ion patterns has emerged as the bottleneck to the dissemination of these HS sequencing methods. To solve this problem, we designed HS-SEQ, the first comprehensive algorithm for HS de novo sequencing using high-resolution tandem mass spectra. We tested HS-SEQ using negative electron transfer dissociation (NETD) tandem mass spectra generated from a set of pure synthetic saccharide standards with diverse sulfation patterns. The results showed that HS-SEQ rapidly and accurately determined the correct HS structures from large candidate pools.

PubMed Disclaimer

Figures

**Fig. 1.**
**Summary of basic concepts in HS-SEQ.** A hexasaccharide [1, 2, 3, 1, 6] ([ΔHexA, HexA, GlcN, Ac, SO₃]) is used to demonstrate the concepts used in HS-SEQ. A, The hexasaccharide sequence illustrated by cartoon symbols. Each monosaccharide has a unique ID. B, Visual representation of the hexasaccharide sequence in HS-SEQ. Two types of modification (substitution) occur on the sequence: acetylation (Ac) and sulfation (SO₃). Each modification type corresponds to a set of candidate modification sites, and a modification distribution, which appends a local likelihood value to each candidate site. C, Visual representation of assignment in HS-SEQ. Each assignment provides structural information in terms of subsets of candidate modification sites as well as the associated modification numbers. D, Assignment helps update the modification distribution. The original modification numbers are denoted by red text and the new numbers deduced from the assignment are denoted by blue text. The added structural information from the assignment updates the local regions of the modification distributions (the updated regions is marked by dashed squares).

**Fig. 2.**
**Data ambiguity in HS sequencing.** A, Assignments of B₄ and Y₂ on a hexsaccharide with composition [1, 2, 3, 1, 6]. Some Y₂ assignments have incompatible modification numbers with the B₄ assignments. For example, B₄ (1Ac+4SO₃) and Y₂ (1Ac+3SO₃) cannot co-exist because the total number of Ac on the sequence is only 1. Alternative assignments are suggested after the arrows. For example, Y₂ (0Ac+2SO₃) can be considered as Y₂ (0Ac+3SO₃) with sulfate loss, or C₂(0Ac+3SO₃) with sulfate loss. B, Classes of data ambiguity. Assignments with either the same mass values (isomeric or isobaric) or different mass values can cause ambiguity, and the ambiguity in essence is the ambiguity of the candidate modification sites and/or associated modification number. “S” denotes “same” and “D” means “different”.

**Fig. 3.**
**Schema of HS sequencing in HS-SEQ.** A, Subtasks in HS sequencing. In HS-SEQ, HS sequencing consists of two basic steps: identification of Ac positions and identification of sulfate positions (sulfate numbers on each residue, and identification of specific sulfate positions on each residue). Data ambiguity is considered for each step, and sulfate loss is considered when identifying sulfate positions. B, Assignment graph connects assignments and generates the modification distribution. The relationship between assignments is visualized by the assignment graph (modification-specific), where the node represents assignment and the edge represents the inclusion relationship of candidate modification sites between assignments. The edge directs the assignment with maximum subset of modification sites (child node) to the assignment with minimum superset of modification sites (parent node). Two nodes are always present in the graph: the null node and the full node. For each new node (assignment), there is always at least one parent node and one child node in the graph. The information of candidate modification sites from the new node helps locate its parent and child in the graph. The information of modification number is adjusted based on confidence estimation of the assignment (discussed in the method section). Each insertion of a node into the graph corresponds to update of local regions on the modification distribution.

**Fig. 4.**
**Structures of nine synthetic pure standards for algorithm validation.** #1 Arixtra [0, 2, 3, 0, 8] (charge state 4-, 5-, and 6-) was purchased from Organon Sanofi-Synthelabo LLC (West Orange, NJ). #2 Hex6 [1, 2, 3, 1, 6] (charge state 3-, 4-, 5-, and 6-) and #3 Hex7 [1, 2, 3, 1, 7] (charge state 3-, 4-, 5-, and 6-) were purchased from New England BioLabs (Ipswich, MA). #4 dp15 [0, 7, 7, 2, 5] (charge state 5-, 6-, 7-, and 8-), #5 P71 [0, 4, 3, 0, 3] (charge state 3- and 4-) and #6 P82 [0, 4, 4, 0, 11] (charge state 6-) were bio-enzymatically synthesized and were generously provided by Prof. Jian Liu from University of North Carolina, Chapel Hill. Synthetic HS tetrasaccharides #7 Boons03 [0, 2, 2, 0, 4] (charge state 3- and 4-), #8 Boons23[0, 2, 2, 0, 4] (charge state 2-, 3-, and 4-) and #9 Boons38[0, 2, 2, 0, 5] (charge state 3- and 4-) were generously provided by Professor Geert-Jan Boons from the Complex Carbohydrate Research Center at the University of Georgia. Me: methyl, AnMan: 2,5-anhydro-D-mannose, PNP: 4-Nitrophenol.

**Fig. 5.**
**Comparison of HS sequencing methods.** The performance for coverage method (denoted in black), GP method (denote in blue), HS-SEQ (Cost) (denoted in red) were compared using the 25 NETD spectra. A, Comparison of the average ranks. B, Comparison of the absolute values of Z-scores. C, Comparison of correlations between average rank and background size. D, Comparison of correlations between Z-score and background size. Note that in A, and B, the sequences were sorted in an ascending order by their background size.

**Fig. 6.**
**Comparison of updated version of HS sequencing methods.** The performance for updated version of the coverage method (M_Coverage, denoted in black), GP method (M_GP, denoted in blue) and HS-SEQ (M_Cost, denoted in red) were compared using the 25 NETD spectra. A, Comparison of the average ranks. B, Comparison of the absolute values of Z-scores. C, Comparison of correlations between average rank and background size. D, Comparison of correlations between Z-score and background size. Note that in A, and B, the sequences were sorted in an ascending order by their background size.

**Fig. 7.**
**Example demonstrating the performance of HS-SEQ.** A, Comparison of histograms of candidate sequence scores using different methods. The calculation was based on tandem mass spectrum from sequence #2 (charge 5-). Red arrow flags the score of the true sequence structure. B, Integration of results from multiple charge states. The modification distributions (bottom left) were calculated using data from sequence #2 (charge 3- ∼ 6-). The modification number on each residue was then mapped to the original oligosaccharide sequence (bottom right). White bar denotes acetylation distribution, gray bar denotes sulfation distribution, and the error bar indicates standard error. Digits beside the vertical solid lines represent the estimated modification number on each residue. Red asterisk indicates the positions where modifications actually occur.

See this image and copyright information in PMC

Cited by

Glycosaminoglycanomics: where we are.
Ricard-Blum S, Lisacek F. Ricard-Blum S, et al. Glycoconj J. 2017 Jun;34(3):339-349. doi: 10.1007/s10719-016-9747-2. Epub 2016 Nov 30. Glycoconj J. 2017. PMID: 27900575 Review.
Targeting heparin and heparan sulfate protein interactions.
Weiss RJ, Esko JD, Tor Y. Weiss RJ, et al. Org Biomol Chem. 2017 Jul 21;15(27):5656-5668. doi: 10.1039/c7ob01058c. Epub 2017 Jun 27. Org Biomol Chem. 2017. PMID: 28653068 Free PMC article. Review.
Ultra-high-performance liquid chromatography charge transfer dissociation mass spectrometry (UHPLC-CTD-MS) as a tool for analyzing the structural heterogeneity in carrageenan oligosaccharides.
Mendis PM, Sasiene ZJ, Ropartz D, Rogniaux H, Jackson GP. Mendis PM, et al. Anal Bioanal Chem. 2022 Jan;414(1):303-318. doi: 10.1007/s00216-021-03396-3. Epub 2021 May 29. Anal Bioanal Chem. 2022. PMID: 34050776
Preparation and characterization of heparin hexasaccharide library with N-unsubstituted glucosamine residues.
Liang QT, Du JY, Fu Q, Lin JH, Wei Z. Liang QT, et al. Glycoconj J. 2015 Nov;32(8):643-53. doi: 10.1007/s10719-015-9612-8. Epub 2015 Aug 15. Glycoconj J. 2015. PMID: 26275985
Heparan sulfate glycomimetics via iterative assembly of "clickable" disaccharides.
Yang C, Deng Y, Wang Y, Xia C, Mehta AY, Baker KJ, Samal A, Booneimsri P, Lertmaneedang C, Hwang S, Flynn JP, Cao M, Liu C, Zhu AC, Cummings RD, Lin C, Mohanty U, Niu J. Yang C, et al. Chem Sci. 2023 Feb 28;14(13):3514-3522. doi: 10.1039/d3sc00260h. eCollection 2023 Mar 29. Chem Sci. 2023. PMID: 37006675 Free PMC article.

See all "Cited by" articles

References

1. Bishop J. R., Schuksz M., Esko J. D. (2007) Heparan sulphate proteoglycans fine-tune mammalian physiology. Nature 446, 1030–1037 - PubMed
1. Parish C. R. (2006) The role of heparan sulphate in inflammation. Nat. Rev. Immunol. 6, 633–643 - PubMed
1. Ori A., Wilkinson M., Fernig D. (2008) The heparanome and regulation of cell function: structures, functions, and challenges. Front. Biosci. J. Virtual Libr. 13, 4309 - PubMed
1. Bülow H. E., Hobert O. (2006) The molecular diversity of glycosaminoglycans shapes animal development. Annu. Rev. Cell Dev. Biol. 22, 375–407 - PubMed
1. Couchman J. R. (2010) Transmembrane signaling proteoglycans. Annu. Rev. Cell Dev. Biol. 26, 89–114 - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra

Affiliations

A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources