Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 6;12(3):e10264.
doi: 10.15252/emmm.201910264. Epub 2020 Feb 12.

Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics

Affiliations

Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics

Ariane Khaledi et al. EMBO Mol Med. .

Abstract

Limited therapy options due to antibiotic resistance underscore the need for optimization of current diagnostics. In some bacterial species, antimicrobial resistance can be unambiguously predicted based on their genome sequence. In this study, we sequenced the genomes and transcriptomes of 414 drug-resistant clinical Pseudomonas aeruginosa isolates. By training machine learning classifiers on information about the presence or absence of genes, their sequence variation, and expression profiles, we generated predictive models and identified biomarkers of resistance to four commonly administered antimicrobial drugs. Using these data types alone or in combination resulted in high (0.8-0.9) or very high (> 0.9) sensitivity and predictive values. For all drugs except for ciprofloxacin, gene expression information improved diagnostic performance. Our results pave the way for the development of a molecular resistance profiling tool that reliably predicts antimicrobial susceptibility based on genomic and transcriptomic markers. The implementation of a molecular susceptibility test system in routine microbiology diagnostics holds promise to provide earlier and more detailed information on antibiotic resistance profiles of bacterial pathogens and thus could change how physicians treat bacterial infections.

Keywords: antibiotic resistance; biomarkers; clinical isolates; machine learning; molecular diagnostics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1. Geographic and phylogenetic distribution of 414 clinical Pseudomonas aeruginosa isolates used in this study
  1. A

    Geographic sampling site distribution, where circle size is proportional to the number of isolates from a particular location.

  2. B

    Phylogenetic tree of the clinical isolates and seven reference strains (blue dots). A PA7‐like outgroup clade including two clinical isolates is not shown. Abundant high‐risk clones are indicated by green bars. Scale bar: 0.04.

  3. C

    Antimicrobial susceptibility profiles against the four commonly administered antibiotics tobramycin (TOB), ceftazidime (CAZ), ciprofloxacin (CIP), and meropenem (MEM) determined by agar dilution according to Clinical & Laboratory Standards Institute Guidelines (CLSI, 2018).

Figure 2
Figure 2. Training and validating a diagnostic classifier for antimicrobial susceptibility prediction for four different drugs based on genomic (GPA/SNPs) and transcriptomic profiles (EXPR)
The best data type combination was determined using 80% of the data in standard and phylogenetically informed cross‐validation (cv) and further validated on the remaining 20% of the data.
Figure 3
Figure 3. Evaluation of AMR classification with a support vector machine (R: resistant; S: susceptible) using different performance metrics and data types (EXPR: gene expression; GPA: gene presence or absence; and SNPs: single nucleotide polymorphisms) or combinations thereof
Each individual panel depicts the results for one of four different anti‐pseudomonal antibiotics (CAZ, CIP, MEM, and TOB). The solid vertical line in the box plots represents the median, the box limits depict the 25th and 75th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots.
Figure 4
Figure 4. The number of features used by the support vector machine classifier (top panels) and corresponding classification performance (bottom panels) varies with the hyperparameter C
The C parameter is inversely related to the number of markers being included in the model, i.e., lower values for the C parameter yield models with less features. The SVM resistance/susceptibility classifier was evaluated in five repeats of 10‐fold nested cross‐validation. Each panel depicts the results for a different drug (CAZ, CIP, MER, and TOB) based on the best data type combination (GPA+EXPR/SNPs). The model with the fewest features within one standard deviation of the maximal performance was selected as the most suitable diagnostic classification model (red) (Dataset EV5). The solid vertical line in the box plots represents the median, the box limits depict the 25th and 75th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots.
Figure 5
Figure 5. Performance of the support vector machine (SVM) classifier for antimicrobial resistance and susceptibility prediction for different data types, different drugs, and different evaluation schemes
The SVM performance was summarized by the F1‐score and is shown for standard cross‐validation (standard_cv, blue) and cross‐validation using phylogenetically related blocks of isolates (block_cv, red) based on the training dataset (80% of the isolates) and for the validation dataset (green; 20% of the isolates). EXPR: gene expression; GPA: gene presence and absence with indel information. SNPs: short nucleotide polymorphisms. The solid vertical line in the box plots represents the median, the box limits depict the 25th and 75th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots.
Figure 6
Figure 6. Classification performance improves and plateaus with the number of training samples used
A support vector machine‐based resistance/susceptibility classifier was trained on differently sized and randomly drawn subsamples from our isolate collection and evaluated in five repeats of a 10‐fold nested cross‐validation. Each panel depicts the results for a different drug (CAZ, CIP, MEM, and TOB) based on the best data type combination (GPA+EXPR/SNPs). The solid vertical line in the box plots represents the median, the box limits depict the 25th and 75th percentile, and the lower and upper hinges include values within ± 1.5 times the interquartile range. Values outside that range were plotted as solid dots.
Figure EV1
Figure EV1. Correlation of susceptibility profiling by MIC testing and resistance prediction using diagnostic classifiers for ciprofloxacin
The phylogenetic tree is based on the 414 Pseudomonas aeruginosa isolates used in this study. The branches leading toward two deeply branching clades were collapsed (CH4433 and ESP077, and CH4684, CH5206, and CH5387). The inner ring depicts the susceptibility of each isolate to ciprofloxacin; green, susceptible; red, resistant; rose, intermediate resistant. The outer rings show the susceptibility as assigned by the diagnostic classifier (intermediate resistant samples were not assigned).
Figure EV2
Figure EV2. Correlation of susceptibility profiling by MIC testing and resistance prediction using diagnostic classifiers for meropenem
The phylogenetic tree is based on the 414 Pseudomonas aeruginosa isolates used in this study. The branches leading toward two deeply branching clades were collapsed (CH4433 and ESP077, and CH4684, CH5206, and CH5387). The inner ring depicts the susceptibility of each isolate to meropenem; green, susceptible; red, resistant; rose, intermediate resistant. The outer rings show the susceptibility as assigned by the diagnostic classifier (intermediate resistant samples were not assigned).
Figure EV3
Figure EV3. Correlation of susceptibility profiling by MIC testing and resistance prediction using diagnostic classifiers for tobramycin
The phylogenetic tree is based on the 414 Pseudomonas aeruginosa isolates used in this study. The branches leading toward two deeply branching clades were collapsed (CH4433 and ESP077, and CH4684, CH5206, and CH5387). The inner ring depicts the susceptibility of each isolate to tobramycin; green, susceptible; red, resistant; rose, intermediate resistant. The outer rings show the susceptibility as assigned by the diagnostic classifier (intermediate resistant samples were not assigned).
Figure EV4
Figure EV4. Correlation of susceptibility profiling by MIC testing and resistance prediction using diagnostic classifiers for ceftazidime
The phylogenetic tree is based on the 414 Pseudomonas aeruginosa isolates used in this study. The branches leading toward two deeply branching clades were collapsed (CH4433 and ESP077, and CH4684, CH5206, and CH5387). The inner ring depicts the susceptibility of each isolate to ceftazidime; green, susceptible; red, resistant; rose, intermediate resistant. The outer rings show the susceptibility as assigned by the diagnostic classifier (intermediate resistant samples were not assigned).
Figure 7
Figure 7. Number of samples misclassified and correctly predicted by the support vector machine resistance and susceptibility classifier (SVM) grouped by their minimum inhibitory concentration
Each panel depicts the results for a different anti‐pseudomonal drug (CAZ: ceftazidime; CIP: ciprofloxacin; MEM: meropenem; TOB: tobramycin) for the best data type combination (GPA+EXPR/SNPs). Misclassified and correctly classified samples for the training dataset (80%) were inferred in a 10‐fold cross‐validation. An SVM trained on the training dataset was used to predict resistance/susceptibility of the test samples (20%). The number of misclassified samples in the training (80%) and test set was aggregated.
Figure EV5
Figure EV5. Resistance overlap and correlation between different drugs using the Kullback–Leibler divergence of resistance profiles for different pairs of drugs
  1. A

    Venn diagram shows the overlap of resistances among the collected clinical isolates.

  2. B

    To further investigate the co‐resistance among the drugs, we also calculated the Kullback–Leibler divergence (KL divergence) between resistance patterns. We use the KL divergence as a non‐symmetric measurement of the differences between resistance patterns of different drugs. We use a non‐symmetric measure to be able to distinguish among the case of given a resistance to drug A, what does this imply for resistance against drug B (drug A, drug B) and the other way around (drug B, drug A). We normalized the KL values by dividing all values by the maximum in the table. In this analysis, the divergence is a measure to indicate whether information on a particular drug resistance could not be used to predict the simultaneous appearance of a second resistance. Thus, the higher the divergence value is, the less the information is available to predict a particular resistance pattern. The results imply, i.e., that TOB resistance comes with a higher probability of simultaneous MEM resistance, however, not the other way around.

Comment in

  • Lean, mean, learning machines.
    Wheeler NE, Sánchez-Busó L, Argimón S, Jeffrey B. Wheeler NE, et al. Nat Rev Microbiol. 2020 May;18(5):266. doi: 10.1038/s41579-020-0357-4. Nat Rev Microbiol. 2020. PMID: 32203299 No abstract available.

References

    1. Alanis AJ (2005) Resistance to antibiotics: are we in the post‐antibiotic era? Arch Med Res 36: 697–705 - PubMed
    1. Andrews S (2010) FastQC: a quality control tool for high throughput sequence data [Online]. Available at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
    1. Asgari E, Garakani K, McHardy AC, Mofrad MRK (2018) MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k‐mer based representation of shallow sub‐samples. Bioinformatics 35: 1082 - PMC - PubMed
    1. Bagge N, Ciofu O, Hentzer M, Campbell JIA, Givskov M, Høiby N (2002) Constitutive high expression of chromosomal β‐lactamase in Pseudomonas aeruginosa caused by a new insertion sequence (IS1669) located in ampD. Antimicrob Agents Chemother 46: 3406–3411 - PMC - PubMed
    1. Balasubramanian D, Schneper L, Merighi M, Smith R, Narasimhan G, Lory S, Mathee K (2012) The regulatory repertoire of Pseudomonas aeruginosa AmpC ß‐lactamase regulator AmpR includes virulence genes. PLoS One 7: e34067 - PMC - PubMed

Publication types

Substances

Associated data