Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 14;14(12):e1006258.
doi: 10.1371/journal.pcbi.1006258. eCollection 2018 Dec.

Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data

Affiliations

Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data

Danesh Moradigaravand et al. PLoS Comput Biol. .

Abstract

The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81-0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Prediction performance of the best tuned models.
Accuracy and F1 score (harmonic mean of precision and recall; y-axis) for resistant (top panel) and susceptible (middle panel) phenotypes for four predictive models (red: gradient boosted decision trees; green: logistic regression; teal: random forests; purple: deep learning) across eleven antibiotics (x-axis). The best model of each class for every drug (x-axis) was identified based on the accuracy for predicting resistance and employed a number of possible combinations of gene presence, population structure, and year of isolation (lower panel; black: feature used; white: feature not used).
Fig 2
Fig 2. Population structure and phenotypic distribution of the input data.
A) Phylogenetic distribution of clusters identified in the population for SNP distance cut-off values of 2, 143, 5054 and 14489 (outer circles) relative to the phylogenetic tree. B) Phylogenetic distribution of correct calls (true positives, true negatives) and errors (false positives, false negatives) when predicting cephalothin (CIP) resistance with the best performing gradient boosted model. The accuracy for resistance was 0.91. C) Phylogenetic distribution of the most important identified population structure feature, clustering with SNP cut-off of 129 (outer ring), compared with the phylogenetic distribution of resistance phenotype (inner ring; blue: susceptible; light red: intermediate and red: resistant) on the test dataset. Clusters with more than one member are shown.

References

    1. Holmes AH, Moore LS, Sundsfjord A, Steinbakk M, Regmi S, Karkey A, et al. Understanding the mechanisms and drivers of antimicrobial resistance. Lancet. 2016;387(10014):176–87. Epub 2015/11/26. 10.1016/S0140-6736(15)00473-0 . - DOI - PubMed
    1. Sommer MOA, Munck C, Toft-Kehler RV, Andersson DI. Prediction of antibiotic resistance: time for a new preclinical paradigm? Nat Rev Microbiol. 2017;15(11):689–96. Epub 2017/08/02. 10.1038/nrmicro.2017.75 . - DOI - PubMed
    1. Burnham CD, Leeds J, Nordmann P, O'Grady J, Patel J. Diagnosing antimicrobial resistance. Nat Rev Microbiol. 2017;15(11):697–703. Epub 2017/10/13. 10.1038/nrmicro.2017.103 . - DOI - PubMed
    1. McArthur AG, Waglechner N, Nizam F, Yan A, Azad MA, Baylay AJ, et al. The comprehensive antibiotic resistance database. Antimicrob Agents Chemother. 2013;57(7):3348–57. Epub 2013/05/08. 10.1128/AAC.00419-13 - DOI - PMC - PubMed
    1. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother. 2012;67(11):2640–4. 10.1093/jac/dks261 - DOI - PMC - PubMed

Publication types

MeSH terms