Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 3;38(2):325-334.
doi: 10.1093/bioinformatics/btab681.

Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning

Affiliations

Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning

Yunxiao Ren et al. Bioinformatics. .

Abstract

Motivation: Antimicrobial resistance (AMR) is one of the biggest global problems threatening human and animal health. Rapid and accurate AMR diagnostic methods are thus very urgently needed. However, traditional antimicrobial susceptibility testing (AST) is time-consuming, low throughput and viable only for cultivable bacteria. Machine learning methods may pave the way for automated AMR prediction based on genomic data of the bacteria. However, comparing different machine learning methods for the prediction of AMR based on different encodings and whole-genome sequencing data without previously known knowledge remains to be done.

Results: In this study, we evaluated logistic regression (LR), support vector machine (SVM), random forest (RF) and convolutional neural network (CNN) for the prediction of AMR for the antibiotics ciprofloxacin, cefotaxime, ceftazidime and gentamicin. We could demonstrate that these models can effectively predict AMR with label encoding, one-hot encoding and frequency matrix chaos game representation (FCGR encoding) on whole-genome sequencing data. We trained these models on a large AMR dataset and evaluated them on an independent public dataset. Generally, RFs and CNNs perform better than LR and SVM with AUCs up to 0.96. Furthermore, we were able to identify mutations that are associated with AMR for each antibiotic.

Availability and implementation: Source code in data preparation and model training are provided at GitHub website (https://github.com/YunxiaoRen/ML-iAMR).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Workflow of the study. WGS data from Giessen and the public data from Moradigaravand et al. (2018) were processed, and single nucleotide polymorphisms (SNPs) were called. The SNP data were encoded by label encoding, one-hot encoding and FCGR encoding for subsequent machine learning. The Giessen dataset was used to train and validate the four machine learning algorithms using cross-validation. The public data were used for the final evaluation of the models. Finally, we analyzed the association of SNPs and SNPs-adjacent genes with AMR using EFS. Created with BioRender.com
Fig. 2.
Fig. 2.
ROC curves for the models with label encoding, one-hot encoding and FCGR encoding on the Giessen data. First row: ROC curves for CIP with label encoding (A), one-hot encoding (B) and FCGR encoding (C), respectively. Second row: ROC curves for CTX with label encoding (D), one-hot encoding (E) and FCGR encoding (F), respectively. Third row: ROC curves for CTZ with label encoding (G), one-hot encoding (H) and FCGR encoding (I), respectively. Fourth row: ROC curves for GEN with label encoding (J), one-hot encoding (K) and FCGR encoding (L), respectively
Fig. 3.
Fig. 3.
ROC curves for the models with label, one-hot and FCGR encoding on the public data. First row: ROC curves for CIP with label encoding (A), one-hot encoding (B) and FCGR encoding (C), respectively. Second row: ROC curves for CTX with label encoding (D), one-hot encoding (E) and FCGR encoding (F), respectively. Third row: ROC curves for CTZ with label encoding (G), one-hot encoding (H) and FCGR encoding (I), respectively. Fourth row: ROC curves for GEN with label encoding (J), one-hot encoding (K) and FCGR encoding (L), respectively
Fig. 4.
Fig. 4.
EFS analysis for each antibiotic for both datasets. The left four figures are the identified ten most important SNPs for CIP (A), CTX (C), CTZ (E) and GEN (G) from the Giessen dataset. The right figures are the corresponding SNPs from the public dataset

References

    1. Abdolmaleki Z. et al. (2019) Phenotypic and genotypic characterization of antibiotic resistance in the methicillin-resistant Staphylococcus aureus strains isolated from hospital cockroaches. Antimicrob. Resist. Infect. Control, 8, 54. - PMC - PubMed
    1. Almeida J.S. et al. (2001) Analysis of genomic sequences by chaos game representation. Bioinformatics, 17, 429–437. - PubMed
    1. Arango-Argoty G. et al. (2018) DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome, 6, 1–15. - PMC - PubMed
    1. Beceiro A. et al. (2013) Antimicrobial resistance and virulence: a successful or deleterious association in the bacterial world? Clin. Microbiol. Rev., 26, 185–230. - PMC - PubMed
    1. Boolchandani M. et al. (2019) Sequencing-based methods and resources to study antimicrobial resistance. Nat. Rev. Genet., 20, 356–370. - PMC - PubMed

Publication types