Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 4;26(8):107257.
doi: 10.1016/j.isci.2023.107257. eCollection 2023 Aug 18.

Impact of infection on proteome-wide glycosylation revealed by distinct signatures for bacterial and viral pathogens

Collaborators, Affiliations

Impact of infection on proteome-wide glycosylation revealed by distinct signatures for bacterial and viral pathogens

Esther Willems et al. iScience. .

Abstract

Mechanisms of infection and pathogenesis have predominantly been studied based on differential gene or protein expression. Less is known about posttranslational modifications, which are essential for protein functional diversity. We applied an innovative glycoproteomics method to study the systemic proteome-wide glycosylation in response to infection. The protein site-specific glycosylation was characterized in plasma derived from well-defined controls and patients. We found 3862 unique features, of which we identified 463 distinct intact glycopeptides, that could be mapped to more than 30 different proteins. Statistical analyses were used to derive a glycopeptide signature that enabled significant differentiation between patients with a bacterial or viral infection. Furthermore, supported by a machine learning algorithm, we demonstrated the ability to identify the causative pathogens based on the distinctive host blood plasma glycopeptide signatures. These results illustrate that glycoproteomics holds enormous potential as an innovative approach to improve the interpretation of relevant biological changes in response to infection.

Keywords: Glycobiology; Glycomics; Health sciences; Immunology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Study group characteristics Pie charts representing the distribution of age, sex, and type of infection in the patient and control groups, respectively. Additional clinical characteristics for the healthy individuals and patients are enlisted in Tables S1 and S2.
Figure 2
Figure 2
Overview of the data (A) Charge deconvoluted consensus feature map, with a color gradient scale to visualize relative differences of all detected features between the bacterial and viral infection group. Differentials that are higher in the bacterial infection group are indicated by a red marker formula image, whereas differentials that were high in the viral infection group are indicated by a blue color formula image. Differentials that were only detected in one of the two compared groups are indicated by a triangle▽. Significant univariate features determined by ANOVA are indicated by a black border ◯. (B) A representative MS/MS spectrum of a glycopeptide with typical oxonium B-ions masses, high mass glycan fragment Y-ions, and peptide fragment b- and y-ions. (C) A Pearson correlation matrix of the healthy group (n = 42) and the patient group (bacterial infection n = 53 and viral infection n = 38) with pooled Pearson’s correlation coefficient of r = 0.76. An example of a linear regression plot (log intensity scale) is shown for the correlation of 3,682 features (blue dots) between two healthy individuals (r = 0.88) and a healthy individual versus an individual with a bacterial infection (r = 0.41). (D) Venn diagram of the total amount of unique features (n = 3,682), of which 14% of peptide fractions were matched with the protein database and 28% of the N-glycan fractions were matched with the glycan database. Combined data resulted in 463 identified N-glycopeptides.
Figure 3
Figure 3
Overview of the dataset glycome (A) The number of N-glycosylation sites that were identified per protein. (B) The distribution (%) of each amino acid in the aligned sequence motif with either an N-X-T (65%) or N-X-S (35%) motif of all identified unique N-glycosylation sites (n = 55). (C) The number of identified N-glycans per N-glycosylation site. (D) The distribution of all 1,044 identified N-glycans over the defined classes: complex, hybrid or high mannose. The number of antennas is indicated for each class, subdivided into fucosylation grade and the percentage of missing sugars on the antenna(s).
Figure 4
Figure 4
Data reduction and differential analysis indicate that the dataset contains multiple distinct signatures for infections (A) Average linkage hierarchical clustering of 3,682 unique features for the individual samples, including the control (n = 42), bacterial (n = 53), and viral (n = 38) samples. (B) Principal component analysis (PCA) score plot of all samples, based on 2,121 univariate significant features (selected using ANOVA, p < 4.5e-6 Bonferroni corrected), showing a complete separation between the control group and the whole patient group on PC1 (36% explained variance) and an almost full separation between the bacterial and viral group on PC2 (6.2% explained variance). (C) PLS-DA classification result, based on a multivariate model of 447 features, for either a bacterial or viral infection (x axis), for each patient sample (y axis). Classification results for each of the 21 independent classification prediction models are indicated by the open circles ◯, where the final PLS-DA model is indicated by a closed circle formula image.
Figure 5
Figure 5
Separation between bacterial and viral class samples based on fully elucidated glycopeptide features (A) PCA score plot of 96 identified highly significant features (ANOVA, 99.9% confidence, Bonferroni corrected p < 4.24E-7). (B) The gene-ontology (GO) hierarchical clustering tree for biological processes summarizing the correlation among the top 30 significant pathways and the corresponding p-value. Pathways with many shared genes/proteins are clustered together. Larger dots indicate higher significant p-values. (C) Schematic linear representation of α1-acid glycoprotein 1 (AGP or A1AG1) with its five N-glycosylation sites (N33, N56, N72, N93, N103), of which three sites were identified in the dataset. The corresponding cumulative bar graphs show for each N-glycosylation site the measured glycan distribution (%) (y axis), comparing the bacterial [B], viral [V], and healthy control group [C] (x axis), using ANOVA with Bonferroni correction, with ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001. (D) ROC curve for all patients (n = 91) based on the clinical C-reactive protein (CRP) levels and the probability score of the model for the glycosylation signature at glycosylation site N93 (A1AG1_N93 – glycan signature). The boxplots below show all individuals for the viral and bacterial group, including mean and standard deviation, based on the CRP levels (left) and the probability score for the glycan signature at site N93 of A1G1 (right). (E) ROC curve for patients with ambiguous CRP levels (>20 and <100 mg/L, n = 26) showing the clinical CRP and the probability score of the model for the glycosylation signature at glycosylation site N93. The boxplots below show all individuals for the viral and bacterial group, including mean and standard deviation, for the CRP concentration (left) and the probability score for the glycosylation signature at site N93 of A1AG1 (right).
Figure 6
Figure 6
Prediction of the type of pathogen By means of a machine learning genetic algorithm random forest (GA-RF) analysis, each (control and patient) sample was classified using 2000 randomly selected training and validation sets from the total features dataset (n = 3,682). (A) Representative spider plot of the classification scores for a patient who has been clinically diagnosed with N. meningitidis serogroup B (patient #102–1332). (B) Representative spider plot of the classification scores for a patient who has been clinically diagnosed with respiratory syncytial virus (RSV) (patient #107–1113). (C) The confusion matrix summarizes the main classification label (columns) for each sample in the clinically diagnosed pathogen class (rows). The numbers indicate the number of (control or patient) samples classified for each pathogen class label, with the correct pathogen classification in bold type.

References

    1. Minguez P., Parca L., Diella F., Mende D.R., Kumar R., Helmer-Citterich M., Gavin A.C., Van Noort V., Bork P. Deciphering a global network of functionally associated post-translational modifications. Mol. Syst. Biol. 2012;8:599. - PMC - PubMed
    1. Beltrao P., Bork P., Krogan N.J., van Noort V. Evolution and functional cross-talk of protein post-translational modifications. Mol. Syst. Biol. 2013;9:714. - PMC - PubMed
    1. Lebrilla C.B., An H.J. The prospects of glycan biomarkers for the diagnosis of diseases. Mol. Biosyst. 2009;5:17–20. - PubMed
    1. Van Scherpenzeel M., Willems E., Lefeber D.J. Clinical diagnostics and therapy monitoring in the congenital disorders of glycosylation. Glycoconj. J. 2016;33:345–358. doi: 10.1007/s10719-015-9639-x. - DOI - PMC - PubMed
    1. Reily C., Stewart T.J., Renfrow M.B., Novak J. Glycosylation in health and disease. Nat. Rev. Nephrol. 2019;15:346–366. - PMC - PubMed

LinkOut - more resources