Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Dec 5;16(1):10943.
doi: 10.1038/s41467-025-66994-0.

Machine learning-supported framework for the classification of mpox infection and MVA immunization from multiplexed serology data

Affiliations

Machine learning-supported framework for the classification of mpox infection and MVA immunization from multiplexed serology data

Rebecca Surtees et al. Nat Commun. .

Abstract

The 2022 global mpox outbreak highlighted the risk of zoonotic diseases establishing sustained transmission in human populations and underscored the need for accurate serological tools to monitor orthopoxvirus exposure. However, cross-reactive antibodies induced by Modified Vaccinia Ankara (MVA) vaccination make it difficult to discriminate between monkeypox virus (MPXV) infection and vaccination-induced immunity. Here we present a machine learning (ML)-assisted bead-based serological multiplex assay that distinguishes MPXV infection from MVA vaccination and pre-immune sera by targeting antibody responses to 15 poxviral antigens. Of the six algorithms tested, the Gradient Boosting Classifier (GBC) achieves the highest performance (F1 = 0.83) in sera from the 2022 outbreak and from a follow-up epidemiological cohort of at-risk men who have sex with men (MSM; n = 1,260). In an independent validation cohort (n = 143), GBC (F1 = 0.70) robustly detects MPXV infections, including breakthrough cases, with 88% specificity and 92% sensitivity. Integrating ML with high-dimensional serology enables accurate cross-sectional classification of orthopoxvirus immune status and provides a scalable framework for mpox serosurveillance and outbreak preparedness.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Outline of antigen panel implemented in the multiplex assay, sample cohorts, and machine learning workflow for differentiating pre-immune, MVA-immunized, and MPXV-infected sera.
a Antigen panel and schematic representation of the bead-based multiplex immunoassay used to quantify antigen-specific IgG and IgM responses. This antigen panel was applied to all cohorts. b Serological cohorts included in the study to train the different ML models. The acute cohort comprises pre-immune sera (collected pre-pandemic), MVA sera (collected after MVA immunization), and MPXV sera (collected less than 33 days after confirmed infection). The epidemiological cohort consists of sera from a high-risk MSM population with self-reported MPXV infection and/or MVA immunization status. c Machine learning workflow. Normalized quantitative measurements of cohort samples were used to train six classifiers based on antigen-specific IgG or combined IgG/IgM responses: RF, FRBC, GBC, LDA, and two hybrid models combining LDA with RF or FRBC. Models trained on IgG data alone or combined IgG/IgM data were evaluated cohort-specific using 5-fold cross-validation repeated three times (15 runs). Additionally, classifiers trained on the acute and epidemiological cohorts were cross-evaluated to assess dataset-specific performance. d Independent validation. A separate validation cohort was used to evaluate model generalizability. Final ensemble predictions were generated by majority voting across all 15 trained models per algorithm (RF, LDA, GBC) and antibody isotype, trained on the combined acute and epidemiological cohorts. Color coding: orange = pre-immune group, blue = MVA group, green = MPXV group. Abbreviations: VACV, vaccinia virus; MPXV, monkeypox virus; MSM, men who have sex with men; HSA, human serum albumin; HEp-2, human epithelial type 2 cell extract; Anti-hu IgG/IgM PE: Anti-human IgG/IgM phycoerythrin (PE)-conjugated detection antibody; RF, Random Forest, FRBC, Fuzzy Rule-Based Classification, GBC, Gradient Boosting Classifier, LDA, Linear Discriminant Analysis; ML, machine learning. Created in BioRender. Stern, D. (2025) https://BioRender.com/kk4g6v5.
Fig. 2
Fig. 2. Antibody profiles of serological cohorts measured by multiplex assay for IgG and IgM binding.
a, b Spider plots showing normalized IgG (a) and IgM (b) responses (min–max scaled between 0 and 1) to each antigen in the acute (top) and epi (bottom) cohorts, stratified by serostatus group (Pre, MVA, MPXV). Responses were further stratified by presumed childhood smallpox vaccination status: black lines indicate vaccinated individuals (age ≥50 years), orange lines unvaccinated individuals (age < 40 years). c, d Box plots of min–max normalized IgG responses to selected antigens in the acute (c) and epi (d) cohorts in each serostatus group (Pre, MVA, and MPXV). Upper panels (Yes) show presumed childhood smallpox vaccinated individuals, lower panel (No) sera from unvaccinated individuals regarding childhood smallpox vaccination. e, f Ratios (dimensionless) of IgG binding to homologous MPXV and VACV antigen pairs (A35/A33, B6/B5, E8/D8) in the combined acute and epi cohorts, stratified by serostatus groups and presumed smallpox vaccination status (e: naïve; f: childhood smallpox vaccinated). Statistical significance between serogroups (two-sided t test; no adjustments were made for group sizes of vaccinated and unvaccinated individuals: Acute cohort: Pre: 45 and 44; MVA: 10 and 16; MPXV: 41 and 161. Epi cohort: Pre: 57 and 222; MVA: 73 and 291; MPXV: 7 and 41. Boxes represent the interquartile range (IQR; 25th–75th percentile), with the line inside indicating the median. Whiskers extend to the most extreme data points within 1.5 × IQR from the hinges. Individual data points are shown using the beeswarm package in R. Color coding (cf): orange = pre-immune group, blue = MVA group, green = MPXV group. Abbreviations: epi, epidemiological; AU, arbitrary units; Pre, pre-immune; VACV, vaccinia virus; MPXV, monkeypox virus; MVA, Modified Vaccinia Ankara; Min.-Max. norm., indicates min–max normalized values scaled between 0 and 1 (unitless). Source data are provided as a Source Data file. Statistically significant differences between serogroups (t test, two-sided) indicated as asterisks (ns; *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001). Exact p-values are provided in the Source Data file.
Fig. 3
Fig. 3. Performance of machine learning-based serogroup classification.
a Classification performance (F1 score; mean ± SD from 15 cross-validation runs) across different combinations of training and test datasets (acute, epi, and combined) using a 5-fold cross-validation repeated three times (n = 15). Results are shown for each machine learning algorithm tested, stratified by input type (IgG only vs. IgG/IgM). b Circular plots showing classification outcomes of the combined cohort using GBC, LDA, and RF on IgG/IgM datasets. Outer segments represent the true serogroup (number of sera shown), while inner traces indicate model predictions (misclassifications shown as percentages, colored by prediction of misclassification), further stratified by presumed smallpox vaccination status or presumed naïve individuals. c Confusion matrices for ensemble predictions using GBC, LDA, and RF on the combined serological cohort and IgG/IgM datasets used for establishment of the ML algorithms. Abbreviations: epi, epidemiological; pre, pre-immune; VACV, vaccinia virus; MPXV, monkeypox virus; MVA, Modified Vaccinia Ankara; LDA, Linear Discriminant Analysis; GBC, Gradient Boosting Classifier; RF, Random Forest; FRBC, Fuzzy Rule-Based Classification; ML, machine learning. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Machine learning-based classification of orthopoxvirus serogroups in an independent validation cohort.
a Heatmap of Z-score scaled (centered and standardized per antigen) IgG multiplex measurements (n = 143), hierarchically clustered using the Ward D2 algorithm. Associated annotations show: IgG-based serostatus inferred from Delta-VACV binding; reported number of MVA vaccine doses; classifications for each serogroup category (Pre, MVA, MPXV) across 15 models (RF, LDA, and GBC, trained on IgG or IgG+IgM); ground truth serogroup labels. Prediction confidence (pred. conf.) is indicated by color shading. b Confusion matrices showing classification results from LDA, RF, and GBC. c Confusion matrix for ensemble predictions from LDA and GBC, stratified by Delta-VACV serostatus (LDA applied to seronegative, GBC to seropositive). Color coding: orange = pre-immune group, blue = MVA group, green = MPXV group. Abbreviations: Pre, pre-immune; VACV, vaccinia virus; MPXV, monkeypox virus; MVA, Modified Vaccinia Ankara; LDA, Linear Discriminant Analysis; GBC, Gradient Boosting Classifier; RF, Random Forest.

References

    1. Silva, N. I. O., de Oliveira, J. S., Kroon, E. G., Trindade, G. S. & Drumond, B. P. Here, there, and everywhere: the wide host range and geographic distribution of zoonotic orthopoxviruses. Viruses10.3390/v13010043 (2020). - PMC - PubMed
    1. Rimoin, A. W. et al. Major increase in human monkeypox incidence 30 years after smallpox vaccination campaigns cease in the Democratic Republic of Congo. Proc. Natl Acad. Sci. USA107, 16262–16267 (2010). - DOI - PMC - PubMed
    1. Adetifa, I., Muyembe, J. J., Bausch, D. G. & Heymann, D. L. Mpox neglect and the smallpox niche: a problem for Africa, a problem for the world. Lancet401, 1822–1824 (2023). - DOI - PMC - PubMed
    1. Lum, F. M. et al. Monkeypox: disease epidemiology, host immunity and clinical interventions. Nat. Rev. Immunol.22, 597–613 (2022). - DOI - PMC - PubMed
    1. Beer, E. M. & Rao, V. B. A systematic review of the epidemiology of human monkeypox outbreaks and implications for outbreak strategy. PLoS Negl. Trop. Dis.13, e0007791 (2019). - DOI - PMC - PubMed

Substances

LinkOut - more resources