Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli
- PMID: 38100178
- PMCID: PMC10763500
- DOI: 10.1099/mgen.0.001151
Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli
Abstract
Several bioinformatics genotyping algorithms are now commonly used to characterize antimicrobial resistance (AMR) gene profiles in whole-genome sequencing (WGS) data, with a view to understanding AMR epidemiology and developing resistance prediction workflows using WGS in clinical settings. Accurately evaluating AMR in Enterobacterales, particularly Escherichia coli, is of major importance, because this is a common pathogen. However, robust comparisons of different genotyping approaches on relevant simulated and large real-life WGS datasets are lacking. Here, we used both simulated datasets and a large set of real E. coli WGS data (n=1818 isolates) to systematically investigate genotyping methods in greater detail. Simulated constructs and real sequences were processed using four different bioinformatic programs (ABRicate, ARIBA, KmerResistance and SRST2, run with the ResFinder database) and their outputs compared. For simulation tests where 3079 AMR gene variants were inserted into random sequence constructs, KmerResistance was correct for 3076 (99.9 %) simulations, ABRicate for 3054 (99.2 %), ARIBA for 2783 (90.4 %) and SRST2 for 2108 (68.5 %). For simulation tests where two closely related gene variants were inserted into random sequence constructs, KmerResistance identified the correct alleles in 35 338/46 318 (76.3 %) simulations, ABRicate identified them in 11 842/46 318 (25.6 %) simulations, ARIBA identified them in 1679/46 318 (3.6 %) simulations and SRST2 identified them in 2000/46 318 (4.3 %) simulations. In real data, across all methods, 1392/1818 (76 %) isolates had discrepant allele calls for at least 1 gene. In addition to highlighting areas for improvement in challenging scenarios, (e.g. identification of AMR genes at <10× coverage, identifying multiple closely related AMR genes present in the same sample), our evaluations identified some more systematic errors that could be readily soluble, such as repeated misclassification (i.e. naming) of genes as shorter variants of the same gene present within the reference resistance gene database. Such naming errors accounted for at least 2530/4321 (59 %) of the discrepancies seen in real data. Moreover, many of the remaining discrepancies were likely 'artefactual', with reporting of cut-off differences accounting for at least 1430/4321 (33 %) discrepants. Whilst we found that comparing outputs generated by running multiple algorithms on the same dataset could identify and resolve these algorithmic artefacts, the results of our evaluations emphasize the need for developing new and more robust genotyping algorithms to further improve accuracy and performance.
Keywords: Escherichia coli; antimicrobial resistance genotyping; genomics; resistance prediction.
Conflict of interest statement
The authors declare that there are no conflicts of interest.
Figures





Similar articles
-
Systematic Evaluation of Whole Genome Sequence-Based Predictions of Salmonella Serotype and Antimicrobial Resistance.Front Microbiol. 2020 Apr 3;11:549. doi: 10.3389/fmicb.2020.00549. eCollection 2020. Front Microbiol. 2020. PMID: 32318038 Free PMC article.
-
Discordant bioinformatic predictions of antimicrobial resistance from whole-genome sequencing data of bacterial isolates: an inter-laboratory study.Microb Genom. 2020 Feb;6(2):e000335. doi: 10.1099/mgen.0.000335. Epub 2020 Feb 12. Microb Genom. 2020. PMID: 32048983 Free PMC article.
-
Using Genomics to Track Global Antimicrobial Resistance.Front Public Health. 2019 Sep 4;7:242. doi: 10.3389/fpubh.2019.00242. eCollection 2019. Front Public Health. 2019. PMID: 31552211 Free PMC article. Review.
-
Taking the next-gen step: Comprehensive antimicrobial resistance detection from Burkholderia pseudomallei.EBioMedicine. 2021 Jan;63:103152. doi: 10.1016/j.ebiom.2020.103152. Epub 2020 Dec 4. EBioMedicine. 2021. PMID: 33285499 Free PMC article.
-
Large-scale assessment of antimicrobial resistance marker databases for genetic phenotype prediction: a systematic review.J Antimicrob Chemother. 2020 Nov 1;75(11):3099-3108. doi: 10.1093/jac/dkaa257. J Antimicrob Chemother. 2020. PMID: 32658975 Free PMC article.
Cited by
-
Large-scale genomic analysis reveals the distribution and diversity of type VI secretion systems in Escherichia coli.mSystems. 2025 Jul 22;10(7):e0010525. doi: 10.1128/msystems.00105-25. Epub 2025 Jun 18. mSystems. 2025. PMID: 40530882 Free PMC article.
-
Population analysis of heavy metal and biocide resistance genes in Salmonella enterica from human clinical cases in New Hampshire, United States.Front Microbiol. 2022 Oct 19;13:983083. doi: 10.3389/fmicb.2022.983083. eCollection 2022. Front Microbiol. 2022. PMID: 36338064 Free PMC article.
-
Nodules-associated Klebsiella oxytoca complex: genomic insights into plant growth promotion and health risk assessment.BMC Microbiol. 2025 May 15;25(1):294. doi: 10.1186/s12866-025-04002-7. BMC Microbiol. 2025. PMID: 40375127 Free PMC article.
References
-
- Quan TP, Bawa Z, Foster D, Walker T, Del Ojo Elias C, et al. Evaluation of whole-genome sequencing for mycobacterial species identification and drug susceptibility testing in a clinical setting: a large-scale prospective assessment of performance against line probe assays and phenotyping. J Clin Microbiol. 2017;56:e01480-17. doi: 10.1128/JCM.01480-17. - DOI - PMC - PubMed
-
- ISO 15189:2022 Medical laboratories — Requirements for quality and competence. [ November 7; 2023 ]. https://www.iso.org/standard/76677.html n.d. accessed.
-
- Guidance for Industry and FDA Class II Special Controls Guidance Document: Antimicrobial Susceptibility Test (AST) Systems. 2018. [ April 11; 2019 ]. http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDo... accessed.
MeSH terms
LinkOut - more resources
Full Text Sources