Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 2;12(5):e0406823.
doi: 10.1128/spectrum.04068-23. Epub 2024 Mar 18.

Application of MALDI-TOF MS and machine learning for the detection of SARS-CoV-2 and non-SARS-CoV-2 respiratory infections

Affiliations

Application of MALDI-TOF MS and machine learning for the detection of SARS-CoV-2 and non-SARS-CoV-2 respiratory infections

Sergey Yegorov et al. Microbiol Spectr. .

Abstract

Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) could aid the diagnosis of acute respiratory infections (ARIs) owing to its affordability and high-throughput capacity. MALDI-TOF MS has been proposed for use on commonly available respiratory samples, without specialized sample preparation, making this technology especially attractive for implementation in low-resource regions. Here, we assessed the utility of MALDI-TOF MS in differentiating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vs non-COVID acute respiratory infections (NCARIs) in a clinical lab setting in Kazakhstan. Nasopharyngeal swabs were collected from inpatients and outpatients with respiratory symptoms and from asymptomatic controls (ACs) in 2020-2022. PCR was used to differentiate SARS-CoV-2+ and NCARI cases. MALDI-TOF MS spectra were obtained for a total of 252 samples (115 SARS-CoV-2+, 98 NCARIs, and 39 ACs) without specialized sample preparation. In our first sub-analysis, we followed a published protocol for peak preprocessing and machine learning (ML), trained on publicly available spectra from South American SARS-CoV-2+ and NCARI samples. In our second sub-analysis, we trained ML models on a peak intensity matrix representative of both South American (SA) and Kazakhstan (Kaz) samples. Applying the established MALDI-TOF MS pipeline "as is" resulted in a high detection rate for SARS-CoV-2+ samples (91.0%), but low accuracy for NCARIs (48.0%) and ACs (67.0%) by the top-performing random forest model. After re-training of the ML algorithms on the SA-Kaz peak intensity matrix, the accuracy of detection by the top-performing support vector machine with radial basis function kernel model was at 88.0%, 95.0%, and 78% for the Kazakhstan SARS-CoV-2+, NCARI, and AC subjects, respectively, with a SARS-CoV-2 vs rest receiver operating characteristic area under the curve of 0.983 [0.958, 0.987]; a high differentiation accuracy was maintained for the South American SARS-CoV-2 and NCARIs. MALDI-TOF MS/ML is a feasible approach for the differentiation of ARI without specialized sample preparation. The implementation of MALDI-TOF MS/ML in a real clinical lab setting will necessitate continuous optimization to keep up with the rapidly evolving landscape of ARI.IMPORTANCEIn this proof-of-concept study, the authors used matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) and machine learning (ML) to identify and distinguish acute respiratory infections (ARI) caused by SARS-CoV-2 versus other pathogens in low-resource clinical settings, without the need for specialized sample preparation. The ML models were trained on a varied collection of MALDI-TOF MS spectra from studies conducted in Kazakhstan and South America. Initially, the MALDI-TOF MS/ML pipeline, trained exclusively on South American samples, exhibited diminished effectiveness in recognizing non-SARS-CoV-2 infections from Kazakhstan. Incorporation of spectral signatures from Kazakhstan substantially increased the accuracy of detection. These results underscore the potential of employing MALDI-TOF MS/ML in resource-constrained settings to augment current approaches for detecting and differentiating ARI.

Keywords: COVID-19; MALDI-TOF MS; SARS-CoV-2; acute respiratory infection; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
Overall study workflow and description of the analyses. AC: asymptomatic controls; MALDI-TOF-MS: matrix-assisted laser desorption ionization mass spectrometry; ML: machine learning; NCARI: non-COVID acute respiratory infections; NPS: nasopharyngeal swab.
Fig 2
Fig 2
MALDI-TOF MS peak data generated using nasopharyngeal swabs and processed following the MALDI-TOF MS/ML pipeline developed by Nachtigall and colleagues (4). (A–C) representative MALDI-TOF MS spectra from symptomatic SARS-CoV-2+ (A), symptomatic non-SARS-CoV-2 (B), and a healthy control sample from Kazakhstan (C). The central line indicates median value of the spectra, while the shaded region on either side represents the interquartile interval. Insets depict a range from 3,000 to 5,500 m/z encompassing 70% (62/88) of the identified peaks. (D) PCA of the combined data set incorporating MALDI-TOF MS data both from Kazakhstan and South America (2020 SARS-CoV+ and symptomatic SARS-CoV-2-negative) (3). (E) Dendrogram of the mass spectra stratified by sub-group from the combined dataset based on the peak intensity matrix for Analysis I.
Fig 3
Fig 3
Classification accuracy of the MALDI-ML algorithms assessed on the data from Kazakhstan and South America. (A) Accuracy metrics for each of the seven ML models trained on the South American MALDI-TOF MS data (Analysis I in the current study) for the differentiation of study sub-groups. (B) ROC curves of the top-performing RF and SVM-L algorithms (Analysis I). (C) Accuracy metrics for each of the seven ML models trained on the combined South America-Kazakhstan data set (Analysis II in the current study) for the differentiation of study sub-groups. (D) ROC curves for the top-performing SVM-R and DT algorithms (Analysis II).

Similar articles

Cited by

References

    1. Yegorov S, Goremykina M, Ivanova R, Good SV, Babenko D, Shevtsov A, MacDonald KS, Zhunussov Y, COVID-19 Genomics and Semey COVID-19 Epidemiology Research Groups . 2021. Epidemiology, clinical characteristics, and virologic features of COVID-19 patients in Kazakhstan: a nation-wide retrospective cohort study. Lancet Reg Health Eur 4:100096. doi:10.1016/j.lanepe.2021.100096 - DOI - PMC - PubMed
    1. Hanson KE, Azar MM, Banerjee R, Chou A, Colgrove RC, Ginocchio CC, Hayden MK, Holodiny M, Jain S, Koo S, Levy J, Timbrook TT, Caliendo AM. 2020. Molecular testing for acute respiratory tract infections: clinical and diagnostic recommendations from the IDSA’s diagnostics committee. Clin Infect Dis 71:2744–2751. doi:10.1093/cid/ciaa508 - DOI - PMC - PubMed
    1. Spick M, Lewis HM, Wilde MJ, Hopley C, Huggett J, Bailey MJ. 2022. Systematic review with meta-analysis of diagnostic test accuracy for COVID-19 by mass spectrometry. Metabolism 126:154922. doi:10.1016/j.metabol.2021.154922 - DOI - PMC - PubMed
    1. Nachtigall FM, Pereira A, Trofymchuk OS, Santos LS. 2020. Detection of SARS-CoV-2 in nasal swabs using MALDI-MS. Nat Biotechnol 38:1168–1173. doi:10.1038/s41587-020-0644-7 - DOI - PubMed
    1. Deulofeu M, García-Cuesta E, Peña-Méndez EM, Conde JE, Jiménez-Romero O, Verdú E, Serrando MT, Salvadó V, Boadas-Vaello P. 2021. Detection of SARS-CoV-2 infection in human nasopharyngeal samples by combining MALDI-TOF MS and artificial intelligence. Front Med 8:661358. doi:10.3389/fmed.2021.661358 - DOI - PMC - PubMed

Publication types