Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep;13(9):2490-502.
doi: 10.1074/mcp.M114.039560. Epub 2014 Jun 12.

A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra

Affiliations

A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra

Han Hu et al. Mol Cell Proteomics. 2014 Sep.

Abstract

Heparan sulfate (HS) is a linear polysaccharide expressed on cell surfaces, in extracellular matrices and cellular granules in metazoan cells. Through non-covalent binding to growth factors, morphogens, chemokines, and other protein families, HS is involved in all multicellular physiological activities. Its biological activities depend on the fine structures of its protein-binding domains, the determination of which remains a daunting task. Methods have advanced to the point that mass spectra with information-rich product ions may be produced on purified HS saccharides. However, the interpretation of these complex product ion patterns has emerged as the bottleneck to the dissemination of these HS sequencing methods. To solve this problem, we designed HS-SEQ, the first comprehensive algorithm for HS de novo sequencing using high-resolution tandem mass spectra. We tested HS-SEQ using negative electron transfer dissociation (NETD) tandem mass spectra generated from a set of pure synthetic saccharide standards with diverse sulfation patterns. The results showed that HS-SEQ rapidly and accurately determined the correct HS structures from large candidate pools.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Summary of basic concepts in HS-SEQ. A hexasaccharide [1, 2, 3, 1, 6] ([ΔHexA, HexA, GlcN, Ac, SO3]) is used to demonstrate the concepts used in HS-SEQ. A, The hexasaccharide sequence illustrated by cartoon symbols. Each monosaccharide has a unique ID. B, Visual representation of the hexasaccharide sequence in HS-SEQ. Two types of modification (substitution) occur on the sequence: acetylation (Ac) and sulfation (SO3). Each modification type corresponds to a set of candidate modification sites, and a modification distribution, which appends a local likelihood value to each candidate site. C, Visual representation of assignment in HS-SEQ. Each assignment provides structural information in terms of subsets of candidate modification sites as well as the associated modification numbers. D, Assignment helps update the modification distribution. The original modification numbers are denoted by red text and the new numbers deduced from the assignment are denoted by blue text. The added structural information from the assignment updates the local regions of the modification distributions (the updated regions is marked by dashed squares).
Fig. 2.
Fig. 2.
Data ambiguity in HS sequencing. A, Assignments of B4 and Y2 on a hexsaccharide with composition [1, 2, 3, 1, 6]. Some Y2 assignments have incompatible modification numbers with the B4 assignments. For example, B4 (1Ac+4SO3) and Y2 (1Ac+3SO3) cannot co-exist because the total number of Ac on the sequence is only 1. Alternative assignments are suggested after the arrows. For example, Y2 (0Ac+2SO3) can be considered as Y2 (0Ac+3SO3) with sulfate loss, or C2(0Ac+3SO3) with sulfate loss. B, Classes of data ambiguity. Assignments with either the same mass values (isomeric or isobaric) or different mass values can cause ambiguity, and the ambiguity in essence is the ambiguity of the candidate modification sites and/or associated modification number. “S” denotes “same” and “D” means “different”.
Fig. 3.
Fig. 3.
Schema of HS sequencing in HS-SEQ. A, Subtasks in HS sequencing. In HS-SEQ, HS sequencing consists of two basic steps: identification of Ac positions and identification of sulfate positions (sulfate numbers on each residue, and identification of specific sulfate positions on each residue). Data ambiguity is considered for each step, and sulfate loss is considered when identifying sulfate positions. B, Assignment graph connects assignments and generates the modification distribution. The relationship between assignments is visualized by the assignment graph (modification-specific), where the node represents assignment and the edge represents the inclusion relationship of candidate modification sites between assignments. The edge directs the assignment with maximum subset of modification sites (child node) to the assignment with minimum superset of modification sites (parent node). Two nodes are always present in the graph: the null node and the full node. For each new node (assignment), there is always at least one parent node and one child node in the graph. The information of candidate modification sites from the new node helps locate its parent and child in the graph. The information of modification number is adjusted based on confidence estimation of the assignment (discussed in the method section). Each insertion of a node into the graph corresponds to update of local regions on the modification distribution.
Fig. 4.
Fig. 4.
Structures of nine synthetic pure standards for algorithm validation. #1 Arixtra [0, 2, 3, 0, 8] (charge state 4-, 5-, and 6-) was purchased from Organon Sanofi-Synthelabo LLC (West Orange, NJ). #2 Hex6 [1, 2, 3, 1, 6] (charge state 3-, 4-, 5-, and 6-) and #3 Hex7 [1, 2, 3, 1, 7] (charge state 3-, 4-, 5-, and 6-) were purchased from New England BioLabs (Ipswich, MA). #4 dp15 [0, 7, 7, 2, 5] (charge state 5-, 6-, 7-, and 8-), #5 P71 [0, 4, 3, 0, 3] (charge state 3- and 4-) and #6 P82 [0, 4, 4, 0, 11] (charge state 6-) were bio-enzymatically synthesized and were generously provided by Prof. Jian Liu from University of North Carolina, Chapel Hill. Synthetic HS tetrasaccharides #7 Boons03 [0, 2, 2, 0, 4] (charge state 3- and 4-), #8 Boons23[0, 2, 2, 0, 4] (charge state 2-, 3-, and 4-) and #9 Boons38[0, 2, 2, 0, 5] (charge state 3- and 4-) were generously provided by Professor Geert-Jan Boons from the Complex Carbohydrate Research Center at the University of Georgia. Me: methyl, AnMan: 2,5-anhydro-D-mannose, PNP: 4-Nitrophenol.
Fig. 5.
Fig. 5.
Comparison of HS sequencing methods. The performance for coverage method (denoted in black), GP method (denote in blue), HS-SEQ (Cost) (denoted in red) were compared using the 25 NETD spectra. A, Comparison of the average ranks. B, Comparison of the absolute values of Z-scores. C, Comparison of correlations between average rank and background size. D, Comparison of correlations between Z-score and background size. Note that in A, and B, the sequences were sorted in an ascending order by their background size.
Fig. 6.
Fig. 6.
Comparison of updated version of HS sequencing methods. The performance for updated version of the coverage method (M_Coverage, denoted in black), GP method (M_GP, denoted in blue) and HS-SEQ (M_Cost, denoted in red) were compared using the 25 NETD spectra. A, Comparison of the average ranks. B, Comparison of the absolute values of Z-scores. C, Comparison of correlations between average rank and background size. D, Comparison of correlations between Z-score and background size. Note that in A, and B, the sequences were sorted in an ascending order by their background size.
Fig. 7.
Fig. 7.
Example demonstrating the performance of HS-SEQ. A, Comparison of histograms of candidate sequence scores using different methods. The calculation was based on tandem mass spectrum from sequence #2 (charge 5-). Red arrow flags the score of the true sequence structure. B, Integration of results from multiple charge states. The modification distributions (bottom left) were calculated using data from sequence #2 (charge 3- ∼ 6-). The modification number on each residue was then mapped to the original oligosaccharide sequence (bottom right). White bar denotes acetylation distribution, gray bar denotes sulfation distribution, and the error bar indicates standard error. Digits beside the vertical solid lines represent the estimated modification number on each residue. Red asterisk indicates the positions where modifications actually occur.

Similar articles

Cited by

References

    1. Bishop J. R., Schuksz M., Esko J. D. (2007) Heparan sulphate proteoglycans fine-tune mammalian physiology. Nature 446, 1030–1037 - PubMed
    1. Parish C. R. (2006) The role of heparan sulphate in inflammation. Nat. Rev. Immunol. 6, 633–643 - PubMed
    1. Ori A., Wilkinson M., Fernig D. (2008) The heparanome and regulation of cell function: structures, functions, and challenges. Front. Biosci. J. Virtual Libr. 13, 4309 - PubMed
    1. Bülow H. E., Hobert O. (2006) The molecular diversity of glycosaminoglycans shapes animal development. Annu. Rev. Cell Dev. Biol. 22, 375–407 - PubMed
    1. Couchman J. R. (2010) Transmembrane signaling proteoglycans. Annu. Rev. Cell Dev. Biol. 26, 89–114 - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources