PlasForest: a homology-based random forest classifier for plasmid detection in genomic datasets
- PMID: 34174810
- PMCID: PMC8236179
- DOI: 10.1186/s12859-021-04270-w
PlasForest: a homology-based random forest classifier for plasmid detection in genomic datasets
Abstract
Background: Plasmids are mobile genetic elements that often carry accessory genes, and are vectors for horizontal transfer between bacterial genomes. Plasmid detection in large genomic datasets is crucial to analyze their spread and quantify their role in bacteria adaptation and particularly in antibiotic resistance propagation. Bioinformatics methods have been developed to detect plasmids. However, they suffer from low sensitivity (i.e., most plasmids remain undetected) or low precision (i.e., these methods identify chromosomes as plasmids), and are overall not adapted to identify plasmids in whole genomes that are not fully assembled (contigs and scaffolds).
Results: We developed PlasForest, a homology-based random forest classifier identifying bacterial plasmid sequences in partially assembled genomes. Without knowing the taxonomical origin of the samples, PlasForest identifies contigs as plasmids or chromosomes with a F1 score of 0.950. Notably, it can detect 77.4% of plasmid contigs below 1 kb with 2.8% of false positives and 99.9% of plasmid contigs over 50 kb with 2.2% of false positives.
Conclusions: PlasForest outperforms other currently available tools on genomic datasets by being both sensitive and precise. The performance of PlasForest on metagenomic assemblies are currently well below those of other k-mer-based methods, and we discuss how homology-based approaches could improve plasmid detection in such datasets.
Keywords: Genomic datasets; Homology; Plasmid identification; Random forest classifier.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures






Similar articles
-
Plasmer: an Accurate and Sensitive Bacterial Plasmid Prediction Tool Based on Machine Learning of Shared k-mers and Genomic Features.Microbiol Spectr. 2023 Jun 15;11(3):e0464522. doi: 10.1128/spectrum.04645-22. Epub 2023 May 16. Microbiol Spectr. 2023. PMID: 37191574 Free PMC article.
-
Plasmid detection and assembly in genomic and metagenomic data sets.Genome Res. 2019 Jun;29(6):961-968. doi: 10.1101/gr.241299.118. Epub 2019 May 2. Genome Res. 2019. PMID: 31048319 Free PMC article.
-
PLASMe: a tool to identify PLASMid contigs from short-read assemblies using transformer.Nucleic Acids Res. 2023 Aug 25;51(15):e83. doi: 10.1093/nar/gkad578. Nucleic Acids Res. 2023. PMID: 37427782 Free PMC article.
-
Genomics of IncP-1 antibiotic resistance plasmids isolated from wastewater treatment plants provides evidence for a widely accessible drug resistance gene pool.FEMS Microbiol Rev. 2007 Jul;31(4):449-77. doi: 10.1111/j.1574-6976.2007.00074.x. Epub 2007 Jun 6. FEMS Microbiol Rev. 2007. PMID: 17553065 Review.
-
Phage hunters: Computational strategies for finding phages in large-scale 'omics datasets.Virus Res. 2018 Jan 15;244:110-115. doi: 10.1016/j.virusres.2017.10.019. Epub 2017 Nov 1. Virus Res. 2018. PMID: 29100906 Review.
Cited by
-
Ecology, more than antibiotics consumption, is the major predictor for the global distribution of aminoglycoside-modifying enzymes.Elife. 2023 Feb 14;12:e77015. doi: 10.7554/eLife.77015. Elife. 2023. PMID: 36785930 Free PMC article.
-
The Role of Mobile Genetic Elements in Virulence Factor Carriage from Symptomatic and Asymptomatic Cases of Escherichia coli Bacteriuria.Microbiol Spectr. 2023 Jun 15;11(3):e0471022. doi: 10.1128/spectrum.04710-22. Epub 2023 May 17. Microbiol Spectr. 2023. PMID: 37195213 Free PMC article.
-
PLSDB: advancing a comprehensive database of bacterial plasmids.Nucleic Acids Res. 2022 Jan 7;50(D1):D273-D278. doi: 10.1093/nar/gkab1111. Nucleic Acids Res. 2022. PMID: 34850116 Free PMC article.
-
Effect of a probiotic and an antibiotic on the mobilome of the porcine microbiota.Front Genet. 2024 Mar 28;15:1355134. doi: 10.3389/fgene.2024.1355134. eCollection 2024. Front Genet. 2024. PMID: 38606356 Free PMC article.
-
Global transmission of broad-host-range plasmids derived from the human gut microbiome.Nucleic Acids Res. 2023 Aug 25;51(15):8005-8019. doi: 10.1093/nar/gkad498. Nucleic Acids Res. 2023. PMID: 37283060 Free PMC article.
References
-
- Johnson TJ, Logue CM, Johnson JR, Kuskowski MA, Sherwood JS, Barnes HJ, et al. Associations between multidrug resistance, plasmid content, and virulence potential among extraintestinal pathogenic and commensal Escherichia coli from humans and poultry. Foodborne Pathog Dis. 2012;9:37–46. doi: 10.1089/fpd.2011.0961. - DOI - PMC - PubMed
-
- Heuer H, Binh CTT, Jechalke S, Kopmann C, Zimmerling U, Krögerrecklenfort E, et al. IncP-1ε plasmids are important vectors of antibiotic resistance genes in agricultural systems: diversification driven by class 1 integron gene cassettes. Front Microbiol. 2012 doi: 10.3389/fmicb.2012.00002. - DOI - PMC - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous