Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 22;15(1):36843.
doi: 10.1038/s41598-025-20684-5.

Mass spectrometry combined with machine learning identifies novel protein signatures as demonstrated with multisystem inflammatory syndrome in children

Affiliations

Mass spectrometry combined with machine learning identifies novel protein signatures as demonstrated with multisystem inflammatory syndrome in children

Jeisac Guzmán Rivera et al. Sci Rep. .

Abstract

Rapid and accurate diagnosis of emerging inflammatory illnesses is challenging due to overlapping clinical features with existing conditions. We demonstrate an approach that integrates proteomic analysis with machine learning to identify diagnostic protein signatures, using the example of SARS-CoV-2-induced multisystem inflammatory syndrome in children (MIS-C). We used plasma samples collected from subjects diagnosed with MIS-C and compared them first to controls with asymptomatic/mild SARS-CoV-2 infection and then to controls with pneumonia or Kawasaki disease. We used mass spectrometry to identify proteins and support vector machine (SVM) algorithm-based classification schemes to identify protein signatures. Diagnostic accuracy was assessed by calculating sensitivity, specificity, and area under the ROC curve (AUC), and corrected for overfitting by cross-validation. Proteomic analysis of a training dataset containing MIS-C (N = 17), and asymptomatic/mild SARS-CoV-2 infected control samples (N = 20) identified 643 proteins, of which 101 were differentially expressed. Plasma proteins associated with inflammation increased, and those associated with metabolism and coagulation decreased in MIS-C relative to controls. The SVM machine learning algorithm identified a three-protein model (ORM1, AZGP1, SERPINA3) that achieved 90.0% specificity, 88.2% sensitivity, and 93.5% AUC, distinguishing MIS-C from controls in the training set. Performance was retained in the validation dataset utilizing MIS-C (N = 19) and asymptomatic/mild SARS-CoV-2 infected control samples (N = 10) (90.0% specificity, 84.2% sensitivity, 87.4% AUC). We next replicated our approach to compare MIS-C with similarly presenting syndromes, such as pneumonia (N = 17) and Kawasaki disease (N = 13), and found a distinct three-protein signature (VWF, FCGBP, and SERPINA3) that accurately distinguished MIS-C from the other conditions (97.5% specificity, 89.5% sensitivity, 95.6% AUC). A software tool was also developed that may be used to evaluate other protein signatures using our data. These results demonstrate that the use of mass spectrometry to identify candidate plasma proteins followed by machine learning, specifically SVM, is an efficient strategy for identifying and evaluating biomarker signatures for disease classification.

Keywords: Biomarkers; Hyperinflammatory illnesses; Long COVID; Support vector machine.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Study approval: All study activities were approved by the Rutgers Institutional Review Board (Pro2020002961) and all methods were performed in accordance with the relevant guidelines. All participants, including parents and/or legal guardians, provided informed consent prior to engaging in study activities.

Figures

Fig. 1
Fig. 1
Support vector machine (SVM) model building and cross-validation. The figure shows the steps of SVM model development used in this work and the sample sets utilized at each step.
Fig. 2
Fig. 2
Volcano plot of differentially abundant proteins between MIS-C and mild/asymptomatic SARS-CoV-2. Proteins in the volcano plot are shown as red circles (Holm corrected p value ≤ 0.05), green circles (Holm corrected p value ≤ 0.05 and ≥ twofold change) and orange circles (Holm corrected p value > 0.05 and ≥ twofold change).
Fig. 3
Fig. 3
Pathway enrichment analysis of differentially abundant proteins. Pathway analysis was conducted on lists of proteins differentially expressed in MIS-C vs. mild/asymptomatic SARS-CoV-2 infection (FDR < 0.05 and p value < 0.05) and the reactome and gene ontology (GO) biological processes databases. The top 20 pathways ranked by p value are shown for differentially increased proteins in MIS-C (panel A, Reactome; panel B, GO) and differentially decreased proteins in MIS-C (panel C, Reactome; panel D, GO).
Fig. 4
Fig. 4
External validation of an SVM model. The figure shows a receiver operating characteristic (ROC) curve visualizing the performance of three proteins (ORM1, SERPINA3, and AZGP1) applied to the validation dataset (MIS-C vs. mild/asymptomatic SARS-CoV-2).
Fig. 5
Fig. 5
Differentially abundant proteins between MIS-C, pneumonia, Kawasaki Disease, and mild/asymptomatic SARS-CoV-2. Volcano plots of differentially abundant proteins between (A) MIS-C and pneumonia, (B) MIS-C and Kawasaki disease, and (C) MIS-C and mild/asymptomatic SARS-CoV-2 infection. Green and red circles are proteins having Holm corrected p value ≤  0.05. Green and orange circles are proteins with ≥ twofold change. (D) UpSet plot showing the shared proteins between all pairwise comparisons. MK MIS-C vs. Kawasaki disease, MP MIS-C vs. pneumonia, MA MIS-C vs. mild/asymptomatic SARS-CoV-2 infection.
Fig. 6
Fig. 6
Multi-disease comparison and SVM model. (A) Volcano plot of differentially abundant proteins between MIS-C and Kawasaki disease (K), pneumonia (P), and mild/asymptomatic SARS-CoV-2 (M/A). Proteins are depicted as green, red, or orange circles, based on Holm adjusted p value and fold-change, as in Fig. 2. (B) ROC curve visualizing the performance of a 3-protein signature (VWF, FCGBP, and SERPINA3). K Kawasaki disease, P pneumonia, M/A mild/asymptomatic SARS-CoV-2 infection.

Update of

References

    1. Riphagen, S., Gomez, X., Gonzalez-Martinez, C., Wilkinson, N. & Theocharis, P. Hyperinflammatory shock in children during COVID-19 pandemic. Lancet395(10237), 1607–1608 (2020). - PMC - PubMed
    1. Whittaker, E. et al. Clinical characteristics of 58 children with a pediatric inflammatory multisystem syndrome temporally associated with SARS-CoV-2. JAMA324(3), 259–269 (2020). - PMC - PubMed
    1. Philadelphia TCsHo. Multisystem inflammatory syndrome (MIS-C) clinical pathway chop.edu (2021). Available from: https://pathways.chop.edu/clinical-pathway/multisystem-inflammatory-synd....
    1. Porritt, R. A. et al. The autoimmune signature of hyperinflammatory multisystem inflammatory syndrome in children. J. Clin. Investig.131(20), e151520 (2021). - PMC - PubMed
    1. Reiter, A. et al. Proteomic mapping identifies serum marker signatures associated with MIS-C specific hyperinflammation and cardiovascular manifestation. Clin. Immunol.264, 110237 (2024). - PubMed

MeSH terms

Supplementary concepts