. 2018 Jul 1;34(13):i89-i95.

doi: 10.1093/bioinformatics/bty276.

A pan-genome-based machine learning approach for predicting antimicrobial resistance activities of the Escherichia coli strains

Hsuan-Lin Her¹, Yu-Wei Wu²

Affiliations

¹ School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.
² Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan.

PMID: 29949970
PMCID: PMC6022653
DOI: 10.1093/bioinformatics/bty276

A pan-genome-based machine learning approach for predicting antimicrobial resistance activities of the Escherichia coli strains

Hsuan-Lin Her et al. Bioinformatics. 2018.

. 2018 Jul 1;34(13):i89-i95.

doi: 10.1093/bioinformatics/bty276.

Authors

Hsuan-Lin Her¹, Yu-Wei Wu²

Affiliations

¹ School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.
² Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan.

PMID: 29949970
PMCID: PMC6022653
DOI: 10.1093/bioinformatics/bty276

Abstract

Motivation: Antimicrobial resistance (AMR) is becoming a huge problem in both developed and developing countries, and identifying strains resistant or susceptible to certain antibiotics is essential in fighting against antibiotic-resistant pathogens. Whole-genome sequences have been collected for different microbial strains in order to identify crucial characteristics that allow certain strains to become resistant to antibiotics; however, a global inspection of the gene content responsible for AMR activities remains to be done.

Results: We propose a pan-genome-based approach to characterize antibiotic-resistant microbial strains and test this approach on the bacterial model organism Escherichia coli. By identifying core and accessory gene clusters and predicting AMR genes for the E. coli pan-genome, we not only showed that certain classes of genes are unevenly distributed between the core and accessory parts of the pan-genome but also demonstrated that only a portion of the identified AMR genes belong to the accessory genome. Application of machine learning algorithms to predict whether specific strains were resistant to antibiotic drugs yielded the best prediction accuracy for the set of AMR genes within the accessory part of the pan-genome, suggesting that these gene clusters were most crucial to AMR activities in E. coli. Selecting subsets of AMR genes for different antibiotic drugs based on a genetic algorithm (GA) achieved better prediction performances than the gene sets established in the literature, hinting that the gene sets selected by the GA may warrant further analysis in investigating more details about how E. coli fight against antibiotics.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Growth rates of the pan-genome sizes, core gene cluster and accessory gene cluster numbers with the increasing number of *E. coli* genomes. The blue, orange and green lines, respectively, represent core-, accessory- and pan-genome sizes

**Fig. 2.**
Differences in the COGs functional distributions between the core- and accessory-genomes. COG percentages were estimated by dividing COG numbers by the total gene cluster numbers in either the core- or accessory-genome. Only COGs differing by at least 2-fold between the core and accessory parts were included

**Fig. 3.**
Prediction accuracies of the AMR activities [in terms of the area under the ROCs curve (AUC)] based on the presence/absence patterns of (i) all core and accessory gene clusters (core + acc); (ii) all accessory gene clusters (acc); (iii) accessory gene clusters with CARD annotations (acc/card) and (iv) all CARD gene clusters. The boxplots indicate the distribution of the predictive accuracy of 12 selected drugs (Section 2 and Section 3). The four blocks of boxplots represent four different machine learning algorithms, including Adaboost, NB, RF and SVM, used in the prediction process. Dashed red line indicates 0.9 AUC

**Fig. 4.**
SVM prediction accuracies of the antimicrobial resistance (AMR) activities (in terms of the area under the receiver operating characteristics curve (AUC)) based on 1) 68 accessory genes with CARD annotations (68 acc/card genes); 2) gene clusters selected for each drug based on the genetic algorithm (GA-selected clusters); 3) gene clusters identified by Scoary; and 4) gene clusters with CARD annotations identified by Scoary (Scoary with CARD). The boxplot indicates the distribution of the prediction accuracies for the 12 selected drugs. Dashed red line indicates 0.9 AUC

See this image and copyright information in PMC

Cited by

Silicon versus Superbug: Assessing Machine Learning's Role in the Fight against Antimicrobial Resistance.
Coxe T, Azad RK. Coxe T, et al. Antibiotics (Basel). 2023 Nov 8;12(11):1604. doi: 10.3390/antibiotics12111604. Antibiotics (Basel). 2023. PMID: 37998806 Free PMC article. Review.
Artificial Intelligence for Antimicrobial Resistance Prediction: Challenges and Opportunities towards Practical Implementation.
Ali T, Ahmed S, Aslam M. Ali T, et al. Antibiotics (Basel). 2023 Mar 6;12(3):523. doi: 10.3390/antibiotics12030523. Antibiotics (Basel). 2023. PMID: 36978390 Free PMC article. Review.
Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes.
Hu K, Meyer F, Deng ZL, Asgari E, Kuo TH, Münch PC, McHardy AC. Hu K, et al. Brief Bioinform. 2024 Mar 27;25(3):bbae206. doi: 10.1093/bib/bbae206. Brief Bioinform. 2024. PMID: 38706320 Free PMC article.
A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes.
VanOeffelen M, Nguyen M, Aytan-Aktug D, Brettin T, Dietrich EM, Kenyon RW, Machi D, Mao C, Olson R, Pusch GD, Shukla M, Stevens R, Vonstein V, Warren AS, Wattam AR, Yoo H, Davis JJ. VanOeffelen M, et al. Brief Bioinform. 2021 Nov 5;22(6):bbab313. doi: 10.1093/bib/bbab313. Brief Bioinform. 2021. PMID: 34379107 Free PMC article.
Artificial intelligence in drug resistance management.
Elalouf A, Elalouf H, Rosenfeld A, Maoz H. Elalouf A, et al. 3 Biotech. 2025 May;15(5):126. doi: 10.1007/s13205-025-04282-w. Epub 2025 Apr 14. 3 Biotech. 2025. PMID: 40235844 Free PMC article. Review.

See all "Cited by" articles

References

1. Angelova M. et al. (2010) Computational methods for gene finding in prokaryotes In: Gusev M. (ed.) ICT Innovations 2010. Ohrid, Macedonia, Springer, pp. 11–20.
1. Bradley P. et al. (2015) Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat. Commun., 6, 10063. - PMC - PubMed
1. Brettin T. et al. (2015) RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci. Rep., 5, 8365. - PMC - PubMed
1. Brynildsrud O. et al. (2016) Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol., 17, 238.. - PMC - PubMed
1. Cormican M., Vellinga A. (2012) Existing classes of antibiotics are probably the best we will ever have. Brit. Med. J., 344, e3369.. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A pan-genome-based machine learning approach for predicting antimicrobial resistance activities of the Escherichia coli strains

Affiliations

A pan-genome-based machine learning approach for predicting antimicrobial resistance activities of the Escherichia coli strains

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical