Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning
- PMID: 37042517
- PMCID: PMC10147375
- DOI: 10.7554/eLife.84167
Rapid geographical source attribution of Salmonella enterica serovar Enteritidis genomes using hierarchical machine learning
Abstract
Salmonella enterica serovar Enteritidis is one of the most frequent causes of Salmonellosis globally and is commonly transmitted from animals to humans by the consumption of contaminated foodstuffs. In the UK and many other countries in the Global North, a significant proportion of cases are caused by the consumption of imported food products or contracted during foreign travel, therefore, making the rapid identification of the geographical source of new infections a requirement for robust public health outbreak investigations. Herein, we detail the development and application of a hierarchical machine learning model to rapidly identify and trace the geographical source of S. Enteritidis infections from whole genome sequencing data. 2313 S. Enteritidis genomes, collected by the UKHSA between 2014-2019, were used to train a 'local classifier per node' hierarchical classifier to attribute isolates to four continents, 11 sub-regions, and 38 countries (53 classes). The highest classification accuracy was achieved at the continental level followed by the sub-regional and country levels (macro F1: 0.954, 0.718, 0.661, respectively). A number of countries commonly visited by UK travelers were predicted with high accuracy (hF1: >0.9). Longitudinal analysis and validation with publicly accessible international samples indicated that predictions were robust to prospective external datasets. The hierarchical machine learning framework provided granular geographical source prediction directly from sequencing reads in <4 min per sample, facilitating rapid outbreak resolution and real-time genomic epidemiology. The results suggest additional application to a broader range of pathogens and other geographically structured problems, such as antimicrobial resistance prediction, is warranted.
Keywords: Salmonella; epidemiology; gastroenteritis; genomics; global health; infectious disease; machine learning; microbiology; public health.
© 2023, Bayliss et al.
Conflict of interest statement
SB, RL, CJ, MC, TD, LC No competing interests declared
Figures










Update of
- doi: 10.1101/2022.08.23.22279111
References
-
- Allard MW, Bell R, Ferreira CM, Gonzalez-Escalona N, Hoffmann M, Muruvanda T, Ottesen A, Ramachandran P, Reed E, Sharma S, Stevens E, Timme R, Zheng J, Brown EW. Genomics of foodborne pathogens for microbial food safety. Current Opinion in Biotechnology. 2018;49:224–229. doi: 10.1016/j.copbio.2017.11.002. - DOI - PubMed
-
- Argimón S, Abudahab K, Goater RJE, Fedosejev A, Bhai J, Glasner C, Feil EJ, Holden MTG, Yeats CA, Grundmann H, Spratt BG, Aanensen DM. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microbial Genomics. 2016;2:e000093. doi: 10.1099/mgen.0.000093. - DOI - PMC - PubMed
-
- Ashton PM, Nair S, Peters TM, Bale JA, Powell DG, Painset A, Tewolde R, Schaefer U, Jenkins C, Dallman TJ, de Pinna EM, Grant KA, Salmonella Whole Genome Sequencing Implementation Group Identification of salmonella for public health surveillance using whole genome sequencing. PeerJ. 2016;4:e1752. doi: 10.7717/peerj.1752. - DOI - PMC - PubMed
-
- Bayliss S, Cowley L. Hierarchical machine learning (HML) swh:1:rev:62bdf3592243b16867de0988cacf0d409d939c11Software Heritage. 2023 https://archive.softwareheritage.org/swh:1:dir:fe1eb963e48d181c85a2bba04...
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous