Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep:71:103546.
doi: 10.1016/j.ebiom.2021.103546. Epub 2021 Aug 19.

Nasopharyngeal metabolomics and machine learning approach for the diagnosis of influenza

Affiliations

Nasopharyngeal metabolomics and machine learning approach for the diagnosis of influenza

Catherine A Hogan et al. EBioMedicine. 2021 Sep.

Abstract

Background: Respiratory virus infections are significant causes of morbidity and mortality, and may induce host metabolite alterations by infecting respiratory epithelial cells. We investigated the use of liquid chromatography quadrupole time-of-flight mass spectrometry (LC/Q-TOF) combined with machine learning for the diagnosis of influenza infection.

Methods: We analyzed nasopharyngeal swab samples by LC/Q-TOF to identify distinct metabolic signatures for diagnosis of acute illness. Machine learning models were performed for classification, followed by Shapley additive explanation (SHAP) analysis to analyze feature importance and for biomarker discovery.

Findings: A total of 236 samples were tested in the discovery phase by LC/Q-TOF, including 118 positive samples (40 influenza A 2009 H1N1, 39 influenza H3 and 39 influenza B) as well as 118 age and sex-matched negative controls with acute respiratory illness. Analysis showed an area under the receiver operating characteristic curve (AUC) of 1.00 (95% confidence interval [95% CI] 0.99, 1.00), sensitivity of 1.00 (95% CI 0.86, 1.00) and specificity of 0.96 (95% CI 0.81, 0.99). The metabolite most strongly associated with differential classification was pyroglutamic acid. Independent validation of a biomarker signature based on the top 20 differentiating ion features was performed in a prospective cohort of 96 symptomatic individuals including 48 positive samples (24 influenza A 2009 H1N1, 5 influenza H3 and 19 influenza B) and 48 negative samples. Testing performed using a clinically-applicable targeted approach, liquid chromatography triple quadrupole mass spectrometry, showed an AUC of 1.00 (95% CI 0.998, 1.00), sensitivity of 0.94 (95% CI 0.83, 0.98), and specificity of 1.00 (95% CI 0.93, 1.00). Limitations include lack of sample suitability assessment, and need to validate these findings in additional patient populations.

Interpretation: This metabolomic approach has potential for diagnostic applications in infectious diseases testing, including other respiratory viruses, and may eventually be adapted for point-of-care testing.

Funding: None.

Keywords: Diagnosis; Host response; Influenza; Mass spectrometry; Metabolomics.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest A provisional patent covering the metabolomics approach combined with machine learning to recognize a medical condition has been filed (C.A.H., P.R., A.T.L., T.M.C., B.P.). The authors declare no other competing interests.

Figures

Fig 1
Fig 1
Conceptual diagram of the study from data collection to interpretation. The phases of data collection, model development, and interpretation are illustrated. LC/Q-TOF: liquid chromatography quadrupole time-of-flight; LC-MS/MS: liquid chromatography-mass spectrometry; RF: random forests; ROC: receiver operating characteristic curve; SHAP: Shapley Additive explanation; SRM: selected reaction monitoring.
Fig 2
Fig 2
Area under the receiver operating characteristic curve test performance of the biomarker discovery set. ROC curves comparing the performance of the machine learning models (RF, LGBM) with the traditional linear models (Lasso, Ridge) on the test set; bracketed values are 95% AUC confidence intervals calculated from a normal fit of the curves. AUC: area under the receiver operating characteristic curve; RF: random forests; ROC: receiver operating characteristic curve.
Fig 3
Fig 3
Feature importance analysis by SHapley Additive exPlanation (SHAP) values. Top 20 ion features by percentage importance using the SHAP method. Ion features are identified by mass-to-charge ratio @ retention time, and colors indicate the association between feature value and positive influenza classification. For example, low values of 84.0447@0.81 are indicative of positive classification, while the relative value of 106.0865@10.34 does not have a clear interpretation, despite being an important feature.
Fig 4
Fig 4
Area under the receiver operating characteristic curve test performance of the validation set. The ROC curve demonstrates the LGBM model performance on the 96-sample validation test set.

References

    1. Somerville L.K., Ratnamohan V.M., Dwyer D.E., Kok J. Molecular diagnosis of respiratory viruses. Pathology. 2015;47(3):243–249. - PMC - PubMed
    1. Schreckenberger P.C., McAdam A.J. Point-counterpoint: large multiplex PCR panels should be first-line tests for detection of respiratory and intestinal pathogens. J Clin Microbiol. 2015;53(10):3110–3115. - PMC - PubMed
    1. Vergara A., Cilloniz C., Luque N. Detection of human cytomegalovirus in bronchoalveolar lavage of intensive care unit patients. Eur Respir J. 2018;51(2):1701332. doi: 10.1183/13993003.01332-2017. - DOI - PubMed
    1. Tan S.K., Burgener E.B., Waggoner J.J. Molecular and culture-based bronchoalveolar lavage fluid testing for the diagnosis of cytomegalovirus pneumonitis. Open Forum Infect Dis. 2016;3(1):ofv212. doi: 10.1093/ofid/ofv212. - DOI - PMC - PubMed
    1. Buchan B.W., Ledeboer N.A. Emerging technologies for the clinical microbiology laboratory. Clin Microbiol Rev. 2014;27(4):783–822. - PMC - PubMed

Substances