Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 19;25(10):105163.
doi: 10.1016/j.isci.2022.105163. eCollection 2022 Oct 21.

Deep learning explains the biology of branched glycans from single-cell sequencing data

Affiliations

Deep learning explains the biology of branched glycans from single-cell sequencing data

Rui Qin et al. iScience. .

Abstract

Glycosylation is ubiquitous and often dysregulated in disease. However, the regulation and functional significance of various types of glycosylation at cellular levels is hard to unravel experimentally. Multi-omics, single-cell measurements such as SUGAR-seq, which quantifies transcriptomes and cell surface glycans, facilitate addressing this issue. Using SUGAR-seq data, we pioneered a deep learning model to predict the glycan phenotypes of cells (mouse T lymphocytes) from transcripts, with the example of predicting β1,6GlcNAc-branching across T cell subtypes (test set F1 score: 0.9351). Model interpretation via SHAP (SHapley Additive exPlanations) identified highly predictive genes, in part known to impact (i) branched glycan levels and (ii) the biology of branched glycans. These genes included physiologically relevant low-abundance genes that were not captured by conventional differential expression analysis. Our work shows that interpretable deep learning models are promising for uncovering novel functions and regulatory mechanisms of glycans from integrated transcriptomic and glycomic datasets.

Keywords: Artificial intelligence; Bioinformatics; Biomolecules.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Different mouse T cells show distinct cell surface glycosylation patterns based on single-cell RNA- and lectin-seq (A) Graphical summary of workflow. (B) Composition of the TIL and LN datasets. Left: numbers of cells and genes in each processed dataset. Right: cell type composition as percentages in each dataset. (C) UMAP clustering of cells in each dataset. (D) Boxplots of processed PHA-L data (β1,6-branched glycan abundance) by cell type in each dataset. (E) Histograms of processed PHA-L data in each dataset. Cell type annotations: Tex, terminally exhausted T cell; Tpex, precursor exhausted T cell. Tfh, follicular helper T cell; Th1, T helper 1 cell; Treg, regulatory T cell.
Figure 2
Figure 2
The deep learning model trained on TIL dataset is highly accurate in predicting glycan classes (A) Graphical description of the neural network structure. (B) ROC curve (upper) and precision-recall curve (lower) of the model using the test set data. (C) Histogram of model output (probability for PHA-Lhigh class) using the test set data. (D) Prediction accuracies by cell types of the test set.
Figure 3
Figure 3
Model interpretation identifies genes important for predicting TIL cell surface glycosylation (A) Histogram (left) and scatterplot (right) presentation of the median absolute SHAP values for all genes. (B) SHAP values of top 30 genes ranked by median absolute SHAP value. (C) SHAP values of top 30 glycogenes ranked by median absolute SHAP value. (D) Gene Ontology pathway enrichment analysis of using the SHAP genes. (E) STRING protein interaction network analysis of the top 10%SHAP genes. Only high confidence (strong evidence) interactions are shown, and thicker edges denote higher confidence. Genes/proteins without high confidence interactions with any other genes/proteins are not displayed.
Figure 4
Figure 4
Most of the highly predictive genes are shared across cell types in the TIL dataset (A) Heatmap of the percentage rankings (by median absolute SHAP value) of genes in each cell type compared to all cell types combined. Cell types are clustered by Euclidean distance. (B) Genes uniquely high ranked (median absolute SHAP value among top 2%) in each cell type. Values in tiles are the rankings of genes (by median absolute SHAP value) in the corresponding cell types. The highest rankings of genes among all cell types are boxed in red.
Figure 5
Figure 5
Predictive genes identified by SHAP tend to be involved in the biology of MGAT5/β1,6-branched glycans These genes encode proteins that: (A) bear PHA-L binding, β1,6-branched N-glycans that can be important to their protein functions; (B) regulate the expression of MGAT5/β1,6-branched glycans; (C) are regulated by β1,6-glycan branching; (D) have immunosuppressive functions that may be synergistic with β1,6-branched N-glycans, which are also immunosuppressive. Gene names, their rankings (by median absolute SHAP value) and relative rankings (ranking/number of all genes × 100%, indicated in parentheses) are shown.
Figure 6
Figure 6
SHAP analysis identifies glycogenes that impact β1,6-branched N-glycan levels or PHA-L binding (A) Partial biosynthetic route of ɑ2,6-sialylated, β1,6-branched glycans and the involvement of B4GALT1 and ST6GAL1 identified by SHAP analysis in this process. ɑ2,6-sialylation abrogates PHA-L binding to branched N-glycans. (B) Possible interplay between UDP-GlcNAc, O-GlcNAcylation, and N-glycan branching, and the involvement of OGT in this process. UDP: uracil diphosphate group.

Similar articles

Cited by

References

    1. Adam G., Rampášek L., Safikhani Z., Smirnov P., Haibe-Kains B., Goldenberg A. Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precis. Oncol. 2020;4:1–10. doi: 10.1038/s41698-020-0122-1. - DOI - PMC - PubMed
    1. Afkarian M., Sedy J.R., Yang J., Jacobson N.G., Cereb N., Yang S.Y., Murphy T.L., Murphy K.M. T-bet is a STATI-induced regulator for IL-12R expression in naïve CD4+ T cells. Nat. Immunol. 2002;3:549–557. doi: 10.1038/ni794. - DOI - PubMed
    1. Agrawal P., Kurcon T., Pilobello K.T., Rakus J.F., Koppolu S., Liu Z., Batista B.S., Eng W.S., Hsu K.L., Liang Y., et al. Mapping posttranscriptional regulation of the human glycome uncovers microRNA defining the glycocode. Proc. Natl. Acad. Sci. USA. 2014;111:4338–4343. doi: 10.1073/pnas.1321524111. - DOI - PMC - PubMed
    1. Agrawal P., Fontanals-Cirera B., Sokolova E., Jacob S., Vaiana C.A., Argibay D., Davalos V., McDermott M., Nayak S., Darvishian F., et al. A systems biology approach identifies FUT8 as a driver of melanoma metastasis. Cancer Cell. 2017;31:804–819.e7. doi: 10.1016/j.ccell.2017.05.007. - DOI - PMC - PubMed
    1. Alatrash G., Qiao N., Zhang M., Zope M., Perakis A.A., Sukhumalchandra P., Philips A.V., Garber H.R., Kerros C., St John L.S., et al. Fucosylation enhances the efficacy of adoptively transferred antigen-specific cytotoxic T lymphocytes. Clin. Cancer Res. 2019;25:2610–2620. doi: 10.1158/1078-0432.CCR-18-1527. - DOI - PMC - PubMed

LinkOut - more resources