. 2012 Oct;16(10):497-512.

doi: 10.1089/omi.2012.0013. Epub 2012 Aug 9.

Automated motif discovery from glycan array data

Sharath R Cholleti¹, Sanjay Agravat, Tim Morris, Joel H Saltz, Xuezheng Song, Richard D Cummings, David F Smith

Affiliations

PMID: 22877213
PMCID: PMC3459425
DOI: 10.1089/omi.2012.0013

Automated motif discovery from glycan array data

Sharath R Cholleti et al. OMICS. 2012 Oct.

. 2012 Oct;16(10):497-512.

doi: 10.1089/omi.2012.0013. Epub 2012 Aug 9.

Authors

Sharath R Cholleti¹, Sanjay Agravat, Tim Morris, Joel H Saltz, Xuezheng Song, Richard D Cummings, David F Smith

Affiliation

¹ Center for Comprehensive Informatics, Emory University, Atlanta, Georgia, USA.

PMID: 22877213
PMCID: PMC3459425
DOI: 10.1089/omi.2012.0013

Abstract

Assessing interactions of a glycan-binding protein (GBP) or lectin with glycans on a microarray generates large datasets, making it difficult to identify a glycan structural motif or determinant associated with the highest apparent binding strength of the GBP. We have developed a computational method, termed GlycanMotifMiner, that uses the relative binding of a GBP with glycans within a glycan microarray to automatically reveal the glycan structural motifs recognized by a GBP. We implemented the software with a web-based graphical interface for users to explore and visualize the discovered motifs. The utility of GlycanMotifMiner was determined using five plant lectins, SNA, HPA, PNA, Con A, and UEA-I. Data from the analyses of the lectins at different protein concentrations were processed to rank the glycans based on their relative binding strengths. The motifs, defined as glycan substructures that exist in a large number of the bound glycans and few non-bound glycans, were then discovered by our algorithm and displayed in a web-based graphical user interface ( http://glycanmotifminer.emory.edu ). The information is used in defining the glycan-binding specificity of GBPs. The results were compared to the known glycan specificities of these lectins generated by manual methods. A more complex analysis was also carried out using glycan microarray data obtained for a recombinant form of human galectin-8. Results for all of these lectins show that GlycanMotifMiner identified the major motifs known in the literature along with some unexpected novel binding motifs.

PubMed Disclaimer

Figures

**FIG. 1.**
GLYMMR uses frequent subtree mining to discover the glycan-binding motifs of glycan binding proteins (GBPs). Nodes are monosaccharides represented by the symbols defined at the bottom of the figure, and their edges are represented with the α or β linkage to the linkage position on the neighboring monosaccharide. A subtree is a node or a set of nodes (highlighted in light blue). Subtrees are expanded (steps 1–3) until they become infrequent, thus generating a set of possible motifs of different sizes (Fuc, fucose; Gal, galactose; GlcNAc, N-acetylglucosamine; Man, mannose; Glc, glucose).

**FIG. 2.**
Summary of GLYMMR algorithm. GLYMMR uses a repetitive interrogation of increasingly larger subtrees to discover motifs.

**FIG. 3.**
Display of motifs for SNA. The structures of motifs (a–f) discovered for SNA over three concentrations are indicated using symbols defined in Figure 1 with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table 1; see online supplementary material at http://www.liebertpub.com) is shown for motifs found as glycans on the array with its corresponding average ranking calculated by the algorithm in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no glycan ID or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif, but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S1 (see online supplementary material at http://www.liebertpub.com).

**FIG. 4.**
Motif f of SNA discovered in Figure 3 occurs in bound and non-binding glycans on v4.0 of the microarray. Motif f (Neu5Acα2-6Gal) is found in 22 bound glycans and in only 4 non-binding glycans. The 4 non-binding glycans that contain motif f are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated, and with the glycan ID number indicating the positions on v4.0 of the CFG glycan microarray (Supplementary Table S1; see online supplementary material at http://www.liebertpub.com), and their corresponding average rankings. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S2; see online supplementary material at http://www.liebertpub.com).

**FIG. 5.**
Discriminatory capability of GLYMMR. The subsets of all glycans are displayed based on their structural attributes relative to their recognition by SNA. Twenty-two glycans of the total 442 on v4.0 of the CFG microarray are recognized as binding glycans by SNA, and all of those glycans contain the sequence Neu5Acα2-6Galβ1-4GlcNAc.

**FIG. 6.**
Display of motifs for HPA. The structures of motifs (a–c) discovered for HPA over three concentrations are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S2; see online supplementary material at http://www.liebertpub.com) is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif; while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S3; see online supplementary material at http://www.liebertpub.com).

**FIG. 7.**
Display of motifs for PNA. The structures of motifs (a–c) discovered for PNA over three concentrations are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S3; see online supplementary material at http://www.liebertpub.com) is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm, and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif, but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S4; see online supplementary material at http://www.liebertpub.com).

**FIG. 8.**
Display of motifs for concanavalin A (Con A) based on three concentrations of the lectin. The structures of motifs discovered for Con A (a–d) are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S4b; see online supplementary material at http://www.liebertpub.com), is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif, but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S5; see online supplementary material at http://www.liebertpub.com).

**FIG. 9.**
Display of motifs for Con A based on two concentrations of the lectin. The structures of motifs discovered for Con A (a–d) are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S4a; see online supplementary material at http://www.liebertpub.com), is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif, but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S6; see online supplementary material at http://www.liebertpub.com).

**FIG. 10.**
Display of motifs for UEA-I. The structures of motifs (a–c) discovered for UEA-I over three concentrations are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S5; see online supplementary material at http://www.liebertpub.com) is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S7 (see online supplementary material at http://www.liebertpub.com).

**FIG. 11.**
Display of motifs for human recombinant Gal-8. The structures of motifs (a–g) discovered for Gal-8 over two concentrations are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID numbers indicating the position on v4.2 of the CFG glycan microarray (Supplementary Table S6; see online supplementary material at http://www.liebertpub.com) are shown for motifs found as glycans on the array with their corresponding average ranks in parentheses. Motifs discovered by the algorithms that are not found as glycans on the array are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S8, panel c (see online supplementary material at http://www.liebertpub.com).

See this image and copyright information in PMC

Cited by

Global comparisons of lectin-glycan interactions using a database of analyzed glycan array data.
Kletter D, Singh S, Bern M, Haab BB. Kletter D, et al. Mol Cell Proteomics. 2013 Apr;12(4):1026-35. doi: 10.1074/mcp.M112.026641. Epub 2013 Feb 11. Mol Cell Proteomics. 2013. PMID: 23399549 Free PMC article.
Using graph convolutional neural networks to learn a representation for glycans.
Burkholz R, Quackenbush J, Bojar D. Burkholz R, et al. Cell Rep. 2021 Jun 15;35(11):109251. doi: 10.1016/j.celrep.2021.109251. Cell Rep. 2021. PMID: 34133929 Free PMC article.
Identifying glycan motifs using a novel subtree mining approach.
Coff L, Chan J, Ramsland PA, Guy AJ. Coff L, et al. BMC Bioinformatics. 2020 Feb 4;21(1):42. doi: 10.1186/s12859-020-3374-4. BMC Bioinformatics. 2020. PMID: 32019496 Free PMC article.
Glycomics and glycoproteomics of viruses: Mass spectrometry applications and insights toward structure-function relationships.
Cipollo JF, Parsons LM. Cipollo JF, et al. Mass Spectrom Rev. 2020 Jul;39(4):371-409. doi: 10.1002/mas.21629. Epub 2020 Apr 29. Mass Spectrom Rev. 2020. PMID: 32350911 Free PMC article. Review.
GlycoPattern: a web platform for glycan array mining.
Agravat SB, Saltz JH, Cummings RD, Smith DF. Agravat SB, et al. Bioinformatics. 2014 Dec 1;30(23):3417-8. doi: 10.1093/bioinformatics/btu559. Epub 2014 Aug 20. Bioinformatics. 2014. PMID: 25143288 Free PMC article.

See all "Cited by" articles

References

1. Baenziger J.U. Fiete D. Structural determinants of concanavalin A specificity for oligosaccharides. J Biol Chem. 1979;254:2400–2407. - PubMed
1. Baldus S.E. Thiele J. Park Y.O. Hanisch F.G. Bara J. Fischer R. Characterization of the binding specificity of Anguilla anguilla agglutinin (AAA) in comparison to Ulex europaeus agglutinin I (UEA-I) Glycoconj J. 1996;13:585–590. - PubMed
1. Bird G.W. Anti-T in Peanuts. Vox Sang. 1964;9:748–749. - PubMed
1. Blixt O. Head S. Mondala T., et al. Printed covalent glycan array for ligand profiling of diverse glycan binding proteins. Proc Natl Acad Sci USA. 2004;101:17033–17038. - PMC - PubMed
1. Carlsson S. Oberg C.T. Carlsson M.C., et al. Affinity of galectin-8 and its carbohydrate recognition domains for ligands in solution and at the cell surface. Glycobiology. 2007;17:663–676. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated motif discovery from glycan array data

Affiliation

Automated motif discovery from glycan array data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials