Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec;9(12):001151.
doi: 10.1099/mgen.0.001151.

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli

Affiliations

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli

Timothy J Davies et al. Microb Genom. 2023 Dec.

Abstract

Several bioinformatics genotyping algorithms are now commonly used to characterize antimicrobial resistance (AMR) gene profiles in whole-genome sequencing (WGS) data, with a view to understanding AMR epidemiology and developing resistance prediction workflows using WGS in clinical settings. Accurately evaluating AMR in Enterobacterales, particularly Escherichia coli, is of major importance, because this is a common pathogen. However, robust comparisons of different genotyping approaches on relevant simulated and large real-life WGS datasets are lacking. Here, we used both simulated datasets and a large set of real E. coli WGS data (n=1818 isolates) to systematically investigate genotyping methods in greater detail. Simulated constructs and real sequences were processed using four different bioinformatic programs (ABRicate, ARIBA, KmerResistance and SRST2, run with the ResFinder database) and their outputs compared. For simulation tests where 3079 AMR gene variants were inserted into random sequence constructs, KmerResistance was correct for 3076 (99.9 %) simulations, ABRicate for 3054 (99.2 %), ARIBA for 2783 (90.4 %) and SRST2 for 2108 (68.5 %). For simulation tests where two closely related gene variants were inserted into random sequence constructs, KmerResistance identified the correct alleles in 35 338/46 318 (76.3 %) simulations, ABRicate identified them in 11 842/46 318 (25.6 %) simulations, ARIBA identified them in 1679/46 318 (3.6 %) simulations and SRST2 identified them in 2000/46 318 (4.3 %) simulations. In real data, across all methods, 1392/1818 (76 %) isolates had discrepant allele calls for at least 1 gene. In addition to highlighting areas for improvement in challenging scenarios, (e.g. identification of AMR genes at <10× coverage, identifying multiple closely related AMR genes present in the same sample), our evaluations identified some more systematic errors that could be readily soluble, such as repeated misclassification (i.e. naming) of genes as shorter variants of the same gene present within the reference resistance gene database. Such naming errors accounted for at least 2530/4321 (59 %) of the discrepancies seen in real data. Moreover, many of the remaining discrepancies were likely 'artefactual', with reporting of cut-off differences accounting for at least 1430/4321 (33 %) discrepants. Whilst we found that comparing outputs generated by running multiple algorithms on the same dataset could identify and resolve these algorithmic artefacts, the results of our evaluations emphasize the need for developing new and more robust genotyping algorithms to further improve accuracy and performance.

Keywords: Escherichia coli; antimicrobial resistance genotyping; genomics; resistance prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Proportion of correct genotype calls for single AMR gene variants in simulated constructs by coverage depth and bioinformatics method.
Fig. 2.
Fig. 2.
Identification of known single AMR gene variants in simulated contexts by bioinformatic method. Note only cases where one or more methods were incorrect are shown (n=1081). *, genes were variably correctly identified across 10 repeats.
Fig. 3.
Fig. 3.
Gene identification concordance vs allele identification concordance. (a) The number of isolates containing at least one allele of the name gene families (x-axis) stratified by method. (b) The proportion of times a given gene was identified concordantly by all four methods. (c) Pairwise agreement between the different methods across all isolates.
Fig. 4.
Fig. 4.
Genotype calls produced by a single method only, stratified by antibiotic class.
Fig. 5.
Fig. 5.
Genotyping agreement across all four bioinformatics algorithms, stratified by gene. Colours on the left indicate which methods agreed with one another, with circles with the same colour indicating agreement. Colours in the main panel of the figure were used to identify the cause of the discrepancy, as denoted in the figure key. Cells (in the figure) were coloured if >90 % of isolates were caused by a given discrepancy. Cells with <10 isolates were not investigated.

Similar articles

Cited by

References

    1. Quainoo S, Coolen JPM, van Hijum S, Huynen MA, Melchers WJG, et al. Whole-genome sequencing of bacterial pathogens: the future of nosocomial outbreak analysis. Clin Microbiol Rev. 2017;30:1015–1063. doi: 10.1128/CMR.00082-17. - DOI - PMC - PubMed
    1. Quan TP, Bawa Z, Foster D, Walker T, Del Ojo Elias C, et al. Evaluation of whole-genome sequencing for mycobacterial species identification and drug susceptibility testing in a clinical setting: a large-scale prospective assessment of performance against line probe assays and phenotyping. J Clin Microbiol. 2017;56:e01480-17. doi: 10.1128/JCM.01480-17. - DOI - PMC - PubMed
    1. Sherry NL, Horan KA, Ballard SA, Gonҫalves da Silva A, Gorrie CL, et al. An ISO-certified genomics workflow for identification and surveillance of antimicrobial resistance. Nat Commun. 2023;14:1–12. doi: 10.1038/s41467-022-35713-4. - DOI - PMC - PubMed
    1. ISO 15189:2022 Medical laboratories — Requirements for quality and competence. [ November 7; 2023 ]. https://www.iso.org/standard/76677.html n.d. accessed.
    1. Guidance for Industry and FDA Class II Special Controls Guidance Document: Antimicrobial Susceptibility Test (AST) Systems. 2018. [ April 11; 2019 ]. http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDo... accessed.

LinkOut - more resources