Genome Informatics and Machine Learning-Based Identification of Antimicrobial Resistance-Encoding Features and Virulence Attributes in Escherichia coli Genomes Representing Globally Prevalent Lineages, Including High-Risk Clonal Complexes
- PMID: 35164570
- PMCID: PMC8844930
- DOI: 10.1128/mbio.03796-21
Genome Informatics and Machine Learning-Based Identification of Antimicrobial Resistance-Encoding Features and Virulence Attributes in Escherichia coli Genomes Representing Globally Prevalent Lineages, Including High-Risk Clonal Complexes
Abstract
Escherichia coli, a ubiquitous commensal/pathogenic member from the Enterobacteriaceae family, accounts for high infection burden, morbidity, and mortality throughout the world. With emerging multidrug resistance (MDR) on a massive scale, E. coli has been listed as one of the Global Antimicrobial Resistance and Use Surveillance System (GLASS) priority pathogens. Understanding the resistance mechanisms and underlying genomic features appears to be of utmost importance to tackle further spread of these multidrug-resistant superbugs. While a few of the globally prevalent sequence types (STs) of E. coli, such as ST131, ST69, ST405, and ST648, have been previously reported to be highly virulent and harboring MDR, there is no clarity if certain ST lineages have a greater propensity to acquire MDR. In this study, large-scale comparative genomics of a total of 5,653 E. coli genomes from 19 ST lineages revealed ST-wide prevalence patterns of genomic features, such as antimicrobial resistance (AMR)-encoding genes/mutations, virulence genes, integrons, and transposons. Interpretation of the importance of these features using a Random Forest Classifier trained with 11,988 genomic features from whole-genome sequence data identified ST-specific or phylogroup-specific signature proteins mostly belonging to different protein superfamilies, including the toxin-antitoxin systems. Our study provides a comprehensive understanding of a myriad of genomic features, ST-specific proteins, and resistance mechanisms entailing different lineages of E. coli at the level of genomes; this could be of significant downstream importance in understanding the mechanisms of AMR, in clinical discovery, in epidemiology, and in devising control strategies. IMPORTANCE With the leap in whole-genome data being generated, the application of relevant methods to mine biologically significant information from microbial genomes is of utmost importance to public health genomics. Machine-learning methods have been used not only to mine, curate, or classify the data but also to identify the relevant features that could be linked to a particular class/target. This is perhaps one of the pioneering studies that has attempted to classify a large repertoire of E. coli genome data sets (5,653 genomes) belonging to 19 different STs (including well-studied as well as understudied STs) using machine learning approaches. Important features identified by these approaches have revealed ST-specific signature proteins, which could be further studied to predict possible associations with the phenotypic profiles, thereby providing a better understanding of virulence and the resistance mechanisms among different clonal lineages of E. coli.
Keywords: AMR surveillance; Escherichia coli; bacterial evolution; bioinformatics; genomics; machine learning; molecular epidemiology; sequence types; virulence.
Conflict of interest statement
The authors declare no conflict of interest.
Figures






Similar articles
-
Comparative Genomic Analysis of Globally Dominant ST131 Clone with Other Epidemiologically Successful Extraintestinal Pathogenic Escherichia coli (ExPEC) Lineages.mBio. 2017 Oct 24;8(5):e01596-17. doi: 10.1128/mBio.01596-17. mBio. 2017. PMID: 29066550 Free PMC article.
-
Whole-genome sequences of multidrug-resistant Escherichia coli in South-Kivu Province, Democratic Republic of Congo: characterization of phylogenomic changes, virulence and resistance genes.BMC Infect Dis. 2019 Feb 11;19(1):137. doi: 10.1186/s12879-019-3763-3. BMC Infect Dis. 2019. PMID: 30744567 Free PMC article.
-
Genomic insights into virulence, antimicrobial resistance, and adaptation acumen of Escherichia coli isolated from an urban environment.mBio. 2024 Mar 13;15(3):e0354523. doi: 10.1128/mbio.03545-23. Epub 2024 Feb 20. mBio. 2024. PMID: 38376265 Free PMC article.
-
Pandemic lineages of extraintestinal pathogenic Escherichia coli.Clin Microbiol Infect. 2014 May;20(5):380-90. doi: 10.1111/1469-0691.12646. Clin Microbiol Infect. 2014. PMID: 24766445 Review.
-
Escherichia coli: An arduous voyage from commensal to Antibiotic-resistance.Microb Pathog. 2025 Jan;198:107173. doi: 10.1016/j.micpath.2024.107173. Epub 2024 Nov 27. Microb Pathog. 2025. PMID: 39608506 Review.
Cited by
-
The role of artificial intelligence and machine learning in predicting and combating antimicrobial resistance.Comput Struct Biotechnol J. 2025 Jan 18;27:423-439. doi: 10.1016/j.csbj.2025.01.006. eCollection 2025. Comput Struct Biotechnol J. 2025. PMID: 39906157 Free PMC article. Review.
-
Unraveling the evolutionary dynamics of toxin-antitoxin systems in diverse genetic lineages of Escherichia coli including the high-risk clonal complexes.mBio. 2024 Jan 16;15(1):e0302323. doi: 10.1128/mbio.03023-23. Epub 2023 Dec 20. mBio. 2024. PMID: 38117088 Free PMC article.
-
Multi-omics strategy reveals potential role of antimicrobial resistance and virulence factor genes responsible for Simmental diarrheic calves caused by Escherichia coli.mSystems. 2024 Jun 18;9(6):e0134823. doi: 10.1128/msystems.01348-23. Epub 2024 May 14. mSystems. 2024. PMID: 38742910 Free PMC article.
-
Predicting Treatment Outcomes in Patients with Low Back Pain Using Gene Signature-Based Machine Learning Models.Pain Ther. 2025 Feb;14(1):359-373. doi: 10.1007/s40122-024-00700-8. Epub 2024 Dec 25. Pain Ther. 2025. PMID: 39722081 Free PMC article.
-
Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia coli.Front Microbiol. 2023 May 12;14:1118158. doi: 10.3389/fmicb.2023.1118158. eCollection 2023. Front Microbiol. 2023. PMID: 37250024 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical