Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Oct;16(10):497-512.
doi: 10.1089/omi.2012.0013. Epub 2012 Aug 9.

Automated motif discovery from glycan array data

Affiliations

Automated motif discovery from glycan array data

Sharath R Cholleti et al. OMICS. 2012 Oct.

Abstract

Assessing interactions of a glycan-binding protein (GBP) or lectin with glycans on a microarray generates large datasets, making it difficult to identify a glycan structural motif or determinant associated with the highest apparent binding strength of the GBP. We have developed a computational method, termed GlycanMotifMiner, that uses the relative binding of a GBP with glycans within a glycan microarray to automatically reveal the glycan structural motifs recognized by a GBP. We implemented the software with a web-based graphical interface for users to explore and visualize the discovered motifs. The utility of GlycanMotifMiner was determined using five plant lectins, SNA, HPA, PNA, Con A, and UEA-I. Data from the analyses of the lectins at different protein concentrations were processed to rank the glycans based on their relative binding strengths. The motifs, defined as glycan substructures that exist in a large number of the bound glycans and few non-bound glycans, were then discovered by our algorithm and displayed in a web-based graphical user interface ( http://glycanmotifminer.emory.edu ). The information is used in defining the glycan-binding specificity of GBPs. The results were compared to the known glycan specificities of these lectins generated by manual methods. A more complex analysis was also carried out using glycan microarray data obtained for a recombinant form of human galectin-8. Results for all of these lectins show that GlycanMotifMiner identified the major motifs known in the literature along with some unexpected novel binding motifs.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
GLYMMR uses frequent subtree mining to discover the glycan-binding motifs of glycan binding proteins (GBPs). Nodes are monosaccharides represented by the symbols defined at the bottom of the figure, and their edges are represented with the α or β linkage to the linkage position on the neighboring monosaccharide. A subtree is a node or a set of nodes (highlighted in light blue). Subtrees are expanded (steps 1–3) until they become infrequent, thus generating a set of possible motifs of different sizes (Fuc, fucose; Gal, galactose; GlcNAc, N-acetylglucosamine; Man, mannose; Glc, glucose).
FIG. 2.
FIG. 2.
Summary of GLYMMR algorithm. GLYMMR uses a repetitive interrogation of increasingly larger subtrees to discover motifs.
FIG. 3.
FIG. 3.
Display of motifs for SNA. The structures of motifs (af) discovered for SNA over three concentrations are indicated using symbols defined in Figure 1 with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table 1; see online supplementary material at http://www.liebertpub.com) is shown for motifs found as glycans on the array with its corresponding average ranking calculated by the algorithm in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no glycan ID or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif, but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S1 (see online supplementary material at http://www.liebertpub.com).
FIG. 4.
FIG. 4.
Motif f of SNA discovered in Figure 3 occurs in bound and non-binding glycans on v4.0 of the microarray. Motif f (Neu5Acα2-6Gal) is found in 22 bound glycans and in only 4 non-binding glycans. The 4 non-binding glycans that contain motif f are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated, and with the glycan ID number indicating the positions on v4.0 of the CFG glycan microarray (Supplementary Table S1; see online supplementary material at http://www.liebertpub.com), and their corresponding average rankings. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S2; see online supplementary material at http://www.liebertpub.com).
FIG. 5.
FIG. 5.
Discriminatory capability of GLYMMR. The subsets of all glycans are displayed based on their structural attributes relative to their recognition by SNA. Twenty-two glycans of the total 442 on v4.0 of the CFG microarray are recognized as binding glycans by SNA, and all of those glycans contain the sequence Neu5Acα2-6Galβ1-4GlcNAc.
FIG. 6.
FIG. 6.
Display of motifs for HPA. The structures of motifs (ac) discovered for HPA over three concentrations are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S2; see online supplementary material at http://www.liebertpub.com) is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif; while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S3; see online supplementary material at http://www.liebertpub.com).
FIG. 7.
FIG. 7.
Display of motifs for PNA. The structures of motifs (ac) discovered for PNA over three concentrations are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S3; see online supplementary material at http://www.liebertpub.com) is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm, and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif, but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S4; see online supplementary material at http://www.liebertpub.com).
FIG. 8.
FIG. 8.
Display of motifs for concanavalin A (Con A) based on three concentrations of the lectin. The structures of motifs discovered for Con A (ad) are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S4b; see online supplementary material at http://www.liebertpub.com), is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif, but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S5; see online supplementary material at http://www.liebertpub.com).
FIG. 9.
FIG. 9.
Display of motifs for Con A based on two concentrations of the lectin. The structures of motifs discovered for Con A (ad) are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S4a; see online supplementary material at http://www.liebertpub.com), is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif, but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S6; see online supplementary material at http://www.liebertpub.com).
FIG. 10.
FIG. 10.
Display of motifs for UEA-I. The structures of motifs (ac) discovered for UEA-I over three concentrations are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID number indicating the position on v4.0 of the CFG glycan microarray (Supplementary Table S5; see online supplementary material at http://www.liebertpub.com) is shown for the motif found as a glycan on the array with its corresponding average rank in parentheses. Motifs discovered by the algorithm that are not found as glycans on the array have no ID number or ranking and are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S7 (see online supplementary material at http://www.liebertpub.com).
FIG. 11.
FIG. 11.
Display of motifs for human recombinant Gal-8. The structures of motifs (ag) discovered for Gal-8 over two concentrations are indicated using symbols defined in Figure 1, with α and β anomeric carbons and linkage positions to the adjacent monosaccharides indicated. The glycan ID numbers indicating the position on v4.2 of the CFG glycan microarray (Supplementary Table S6; see online supplementary material at http://www.liebertpub.com) are shown for motifs found as glycans on the array with their corresponding average ranks in parentheses. Motifs discovered by the algorithms that are not found as glycans on the array are designated NA (not applicable). The number of bound glycans containing motif is determined by the algorithm and indicates the number of bound glycans found on the glycan array that contain the corresponding motif, while the number of non-bound glycans containing motif indicates the number of glycans found on the glycan array that contain the motif but are considered non-binding glycans by the algorithm. The display of the graphical user interface used to generate this summary is shown in Supplementary Figure S8, panel c (see online supplementary material at http://www.liebertpub.com).

Similar articles

Cited by

References

    1. Baenziger J.U. Fiete D. Structural determinants of concanavalin A specificity for oligosaccharides. J Biol Chem. 1979;254:2400–2407. - PubMed
    1. Baldus S.E. Thiele J. Park Y.O. Hanisch F.G. Bara J. Fischer R. Characterization of the binding specificity of Anguilla anguilla agglutinin (AAA) in comparison to Ulex europaeus agglutinin I (UEA-I) Glycoconj J. 1996;13:585–590. - PubMed
    1. Bird G.W. Anti-T in Peanuts. Vox Sang. 1964;9:748–749. - PubMed
    1. Blixt O. Head S. Mondala T., et al. Printed covalent glycan array for ligand profiling of diverse glycan binding proteins. Proc Natl Acad Sci USA. 2004;101:17033–17038. - PMC - PubMed
    1. Carlsson S. Oberg C.T. Carlsson M.C., et al. Affinity of galectin-8 and its carbohydrate recognition domains for ligands in solution and at the cell surface. Glycobiology. 2007;17:663–676. - PubMed

Publication types