. 2020 Mar 6;12(3):e10264.

doi: 10.15252/emmm.201910264. Epub 2020 Feb 12.

Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics

Ariane Khaledi^#^{1

2}, Aaron Weimann^#^{2

3

4}, Monika Schniederjans^#^{1

2}, Ehsaneddin Asgari^#^{3

5}, Tzu-Hao Kuo³, Antonio Oliver⁶, Gabriel Cabot⁶, Axel Kola⁷, Petra Gastmeier⁷, Michael Hogardt⁸, Daniel Jonas⁹, Mohammad Rk Mofrad^{5

10}, Andreas Bremges^{3

4}, Alice C McHardy^{3

4}, Susanne Häussler^{1

2}

Affiliations

¹ Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany.
² Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany.
³ Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
⁴ German Center for Infection Research (DZIF), Braunschweig, Germany.
⁵ Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA.
⁶ Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain.
⁷ Institute of Hygiene and Environmental Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany.
⁸ Institute of Medical Microbiology and Infection Control, University Hospital Frankfurt, Frankfurt/Main, Germany.
⁹ Faculty of Medicine, Institute for Infection Prevention and Hospital Epidemiology, Medical Center-University of Freiburg, Freiburg, Germany.
¹⁰ Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Lab, Berkeley, CA, USA.

^# Contributed equally.

PMID: 32048461
PMCID: PMC7059009
DOI: 10.15252/emmm.201910264

Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics

Ariane Khaledi et al. EMBO Mol Med. 2020.

. 2020 Mar 6;12(3):e10264.

doi: 10.15252/emmm.201910264. Epub 2020 Feb 12.

Authors

Affiliations

¹ Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany.
² Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany.
³ Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
⁴ German Center for Infection Research (DZIF), Braunschweig, Germany.
⁵ Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA.
⁶ Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain.
⁷ Institute of Hygiene and Environmental Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany.
⁸ Institute of Medical Microbiology and Infection Control, University Hospital Frankfurt, Frankfurt/Main, Germany.
⁹ Faculty of Medicine, Institute for Infection Prevention and Hospital Epidemiology, Medical Center-University of Freiburg, Freiburg, Germany.
¹⁰ Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Lab, Berkeley, CA, USA.

^# Contributed equally.

PMID: 32048461
PMCID: PMC7059009
DOI: 10.15252/emmm.201910264

Abstract

Limited therapy options due to antibiotic resistance underscore the need for optimization of current diagnostics. In some bacterial species, antimicrobial resistance can be unambiguously predicted based on their genome sequence. In this study, we sequenced the genomes and transcriptomes of 414 drug-resistant clinical Pseudomonas aeruginosa isolates. By training machine learning classifiers on information about the presence or absence of genes, their sequence variation, and expression profiles, we generated predictive models and identified biomarkers of resistance to four commonly administered antimicrobial drugs. Using these data types alone or in combination resulted in high (0.8-0.9) or very high (> 0.9) sensitivity and predictive values. For all drugs except for ciprofloxacin, gene expression information improved diagnostic performance. Our results pave the way for the development of a molecular resistance profiling tool that reliably predicts antimicrobial susceptibility based on genomic and transcriptomic markers. The implementation of a molecular susceptibility test system in routine microbiology diagnostics holds promise to provide earlier and more detailed information on antibiotic resistance profiles of bacterial pathogens and thus could change how physicians treat bacterial infections.

Keywords: antibiotic resistance; biomarkers; clinical isolates; machine learning; molecular diagnostics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

**Figure 1. Geographic and phylogenetic distribution of 414 clinical *Pseudomonas aeruginosa* isolates used in this study**
A
Geographic sampling site distribution, where circle size is proportional to the number of isolates from a particular location.
B
Phylogenetic tree of the clinical isolates and seven reference strains (blue dots). A PA7‐like outgroup clade including two clinical isolates is not shown. Abundant high‐risk clones are indicated by green bars. Scale bar: 0.04.
C
Antimicrobial susceptibility profiles against the four commonly administered antibiotics tobramycin (TOB), ceftazidime (CAZ), ciprofloxacin (CIP), and meropenem (MEM) determined by agar dilution according to Clinical & Laboratory Standards Institute Guidelines (CLSI, 2018).

**Figure 2. Training and validating a diagnostic classifier for antimicrobial susceptibility prediction for four different drugs based on genomic (GPA/SNPs) and transcriptomic profiles (EXPR)**
The best data type combination was determined using 80% of the data in standard and phylogenetically informed cross‐validation (cv) and further validated on the remaining 20% of the data.

Figure 3. Evaluation of AMR classification with a support vector machine (R: resistant; S: susceptible) using different performance metrics and data types (EXPR: gene expression; GPA: gene presence or absence; and SNPs: single nucleotide polymorphisms) or combinations thereof
Each individual panel depicts the results for one of four different anti‐pseudomonal antibiotics (CAZ, CIP, MEM, and TOB). The solid vertical line in the box plots represents the median, the box limits depict the 25^th and 75^th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots.

**Figure 4. The number of features used by the support vector machine classifier (top panels) and corresponding classification performance (bottom panels) varies with the hyperparameter C**
The C parameter is inversely related to the number of markers being included in the model, i.e., lower values for the C parameter yield models with less features. The SVM resistance/susceptibility classifier was evaluated in five repeats of 10‐fold nested cross‐validation. Each panel depicts the results for a different drug (CAZ, CIP, MER, and TOB) based on the best data type combination (GPA+EXPR/SNPs). The model with the fewest features within one standard deviation of the maximal performance was selected as the most suitable diagnostic classification model (red) (Dataset EV5). The solid vertical line in the box plots represents the median, the box limits depict the 25^th and 75^th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots.

Figure 5. Performance of the support vector machine (SVM) classifier for antimicrobial resistance and susceptibility prediction for different data types, different drugs, and different evaluation schemes
The SVM performance was summarized by the F1‐score and is shown for standard cross‐validation (standard_cv, blue) and cross‐validation using phylogenetically related blocks of isolates (block_cv, red) based on the training dataset (80% of the isolates) and for the validation dataset (green; 20% of the isolates). EXPR: gene expression; GPA: gene presence and absence with indel information. SNPs: short nucleotide polymorphisms. The solid vertical line in the box plots represents the median, the box limits depict the 25^th and 75^th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots.

**Figure 6. Classification performance improves and plateaus with the number of training samples used**
A support vector machine‐based resistance/susceptibility classifier was trained on differently sized and randomly drawn subsamples from our isolate collection and evaluated in five repeats of a 10‐fold nested cross‐validation. Each panel depicts the results for a different drug (CAZ, CIP, MEM, and TOB) based on the best data type combination (GPA+EXPR/SNPs). The solid vertical line in the box plots represents the median, the box limits depict the 25^th and 75^th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots.

**Figure EV1. Correlation of susceptibility profiling by MIC testing and resistance prediction using diagnostic classifiers for ciprofloxacin**
The phylogenetic tree is based on the 414 *Pseudomonas aeruginosa* isolates used in this study. The branches leading toward two deeply branching clades were collapsed (CH4433 and ESP077, and CH4684, CH5206, and CH5387). The inner ring depicts the susceptibility of each isolate to ciprofloxacin; green, susceptible; red, resistant; rose, intermediate resistant. The outer rings show the susceptibility as assigned by the diagnostic classifier (intermediate resistant samples were not assigned).

**Figure EV2. Correlation of susceptibility profiling by MIC testing and resistance prediction using diagnostic classifiers for meropenem**
The phylogenetic tree is based on the 414 *Pseudomonas aeruginosa* isolates used in this study. The branches leading toward two deeply branching clades were collapsed (CH4433 and ESP077, and CH4684, CH5206, and CH5387). The inner ring depicts the susceptibility of each isolate to meropenem; green, susceptible; red, resistant; rose, intermediate resistant. The outer rings show the susceptibility as assigned by the diagnostic classifier (intermediate resistant samples were not assigned).

**Figure EV3. Correlation of susceptibility profiling by MIC testing and resistance prediction using diagnostic classifiers for tobramycin**
The phylogenetic tree is based on the 414 *Pseudomonas aeruginosa* isolates used in this study. The branches leading toward two deeply branching clades were collapsed (CH4433 and ESP077, and CH4684, CH5206, and CH5387). The inner ring depicts the susceptibility of each isolate to tobramycin; green, susceptible; red, resistant; rose, intermediate resistant. The outer rings show the susceptibility as assigned by the diagnostic classifier (intermediate resistant samples were not assigned).

**Figure EV4. Correlation of susceptibility profiling by MIC testing and resistance prediction using diagnostic classifiers for ceftazidime**
The phylogenetic tree is based on the 414 *Pseudomonas aeruginosa* isolates used in this study. The branches leading toward two deeply branching clades were collapsed (CH4433 and ESP077, and CH4684, CH5206, and CH5387). The inner ring depicts the susceptibility of each isolate to ceftazidime; green, susceptible; red, resistant; rose, intermediate resistant. The outer rings show the susceptibility as assigned by the diagnostic classifier (intermediate resistant samples were not assigned).

**Figure 7. Number of samples misclassified and correctly predicted by the support vector machine resistance and susceptibility classifier (SVM) grouped by their minimum inhibitory concentration**
Each panel depicts the results for a different anti‐pseudomonal drug (CAZ: ceftazidime; CIP: ciprofloxacin; MEM: meropenem; TOB: tobramycin) for the best data type combination (GPA+EXPR/SNPs). Misclassified and correctly classified samples for the training dataset (80%) were inferred in a 10‐fold cross‐validation. An SVM trained on the training dataset was used to predict resistance/susceptibility of the test samples (20%). The number of misclassified samples in the training (80%) and test set was aggregated.

**Figure EV5. Resistance overlap and correlation between different drugs using the Kullback–Leibler divergence of resistance profiles for different pairs of drugs**
A
Venn diagram shows the overlap of resistances among the collected clinical isolates.
B
To further investigate the co‐resistance among the drugs, we also calculated the Kullback–Leibler divergence (KL divergence) between resistance patterns. We use the KL divergence as a non‐symmetric measurement of the differences between resistance patterns of different drugs. We use a non‐symmetric measure to be able to distinguish among the case of given a resistance to drug A, what does this imply for resistance against drug B (drug A, drug B) and the other way around (drug B, drug A). We normalized the KL values by dividing all values by the maximum in the table. In this analysis, the divergence is a measure to indicate whether information on a particular drug resistance could *not* be used to predict the simultaneous appearance of a second resistance. Thus, the higher the divergence value is, the less the information is available to predict a particular resistance pattern. The results imply, i.e., that TOB resistance comes with a higher probability of simultaneous MEM resistance, however, not the other way around.

See this image and copyright information in PMC

Comment in

Lean, mean, learning machines.
Wheeler NE, Sánchez-Busó L, Argimón S, Jeffrey B. Wheeler NE, et al. Nat Rev Microbiol. 2020 May;18(5):266. doi: 10.1038/s41579-020-0357-4. Nat Rev Microbiol. 2020. PMID: 32203299 No abstract available.

References

1. Alanis AJ (2005) Resistance to antibiotics: are we in the post‐antibiotic era? Arch Med Res 36: 697–705 - PubMed
1. Andrews S (2010) FastQC: a quality control tool for high throughput sequence data [Online]. Available at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
1. Asgari E, Garakani K, McHardy AC, Mofrad MRK (2018) MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k‐mer based representation of shallow sub‐samples. Bioinformatics 35: 1082 - PMC - PubMed
1. Bagge N, Ciofu O, Hentzer M, Campbell JIA, Givskov M, Høiby N (2002) Constitutive high expression of chromosomal β‐lactamase in Pseudomonas aeruginosa caused by a new insertion sequence (IS1669) located in ampD. Antimicrob Agents Chemother 46: 3406–3411 - PMC - PubMed
1. Balasubramanian D, Schneper L, Merighi M, Smith R, Narasimhan G, Lory S, Mathee K (2012) The regulatory repertoire of Pseudomonas aeruginosa AmpC ß‐lactamase regulator AmpR includes virulence genes. PLoS One 7: e34067 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Associated data

Actions
- Search in PubMed
- Search in GEO

Grants and funding

Ministerio de Economa, Industria y Competitividad/International

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics

Affiliations

Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases