Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 5;14(11):2044.
doi: 10.3390/genes14112044.

Machine Learning Suggests That Small Size Helps Broaden Plasmid Host Range

Affiliations

Machine Learning Suggests That Small Size Helps Broaden Plasmid Host Range

Bing Wang et al. Genes (Basel). .

Abstract

Plasmids mediate gene exchange across taxonomic barriers through conjugation, shaping bacterial evolution for billions of years. While plasmid mobility can be harnessed for genetic engineering and drug-delivery applications, rapid plasmid-mediated spread of resistance genes has rendered most clinical antibiotics useless. To solve this urgent and growing problem, we must understand how plasmids spread across bacterial communities. Here, we applied machine-learning models to identify features that are important for extending the plasmid host range. We assembled an up-to-date dataset of more than thirty thousand bacterial plasmids, separated them into 1125 clusters, and assigned each cluster a distribution possibility score, taking into account the host distribution of each taxonomic rank and the sampling bias of the existing sequencing data. Using this score and an optimized plasmid feature pool, we built a model stack consisting of DecisionTreeRegressor, EvoTreeRegressor, and LGBMRegressor as base models and LinearRegressor as a meta-learner. Our mathematical modeling revealed that sequence brevity is the most important determinant for plasmid spread, followed by P-loop NTPases, mobility factors, and β-lactamases. Ours and other recent results suggest that small plasmids may broaden their range by evading host defenses and using alternative modes of transfer instead of autonomous conjugation.

Keywords: antibiotic resistance genes; clustering; conjugation; horizontal gene transfer; machine learning; plasmid host range; small-size plasmid.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Overview of the dataset. Plasmid size (A) and GC content (B) distribution. Mean and median values are indicated by blue and green dots, respectively. The cutoffs for categorical terms are indicated with orange dots. (C) Distribution of plasmid-borne proteins among COGs (shown on the right). The pie chart shows the proportion of proteins in COGs. The bar chart shows the fraction of proteins assigned to a given COG; fractions of the Z, A, and B categories are indicated by numbers. The three most prevalent COG categories are highlighted in the table.
Figure 2
Figure 2
Plasmid clustering. Every node represents a plasmid. The network includes 30,464 plasmids assigned into 1125 clusters (≥3 members). The nodes are colored according to the three plasmid mobility types: Orange, conjugative. Blue, mobilizable. Green, non-mobilizable.
Figure 3
Figure 3
Defining the plasmid distribution possibility. (A) The sum of distribution possibility (Psum). Meanings of the Psum for distribution levels are indicated. (B) Phyla distribution for clusters with Psum > 4.7. Only clusters with more than 100 plasmids were considered. NPntax: for a given taxon, the plasmid number is normalized by the average plasmid number per cluster.
Figure 4
Figure 4
Performance of the final model stack. The dataset was split 100 times randomly into training (A) and testing (B) dataset pairs. The model stack was trained on the training dataset, and then tested on paired unseen testing datasets. MSE, mean squared error. R2, coefficient of determination. MSE (Psum > 1), MSE for those clusters with Psum > 1. Values are reported as mean ± SD (n = 100). The data points on the very right of the training dataset were synthesized by SMOTE.
Figure 5
Figure 5
Feature importance analysis. (A) An overview of feature types. The outer rim of the diagram represents features grouped into 12 main feature types. The mobility of feature types are features involved in plasmid mobility, which is not the same as the three plasmid mobility types. Features that co-occur in different types are linked by edges. (B) Importance index calculated by drop-column analysis. Positive values indicate that the feature removal worsens model stack performance, whereas negative values indicate features whose removal improves model performance. An importance index of 0 indicates that feature removal has no effect. (C) Model stack performance on testing dataset after removal of “smallest”. Values are reported as mean ± SD (n = 100). MSE was calculated by using 100 randomly split training and testing dataset pairs as in Figure 4. (D) Feature type distribution analysis from randomly sampled seven-feature combinations. All sampled combinations, the 11,000 combinations. Selected combinations, the 33 selected combinations. (E) Detailed analysis of the backbone features in all vs. selected combinations. (F) The mobility types distribution in plasmids of different sizes.

Similar articles

Cited by

  • High-Risk Lineages of Hybrid Plasmids Carrying Virulence and Carbapenemase Genes.
    Shapovalova VV, Chulkova PS, Ageevets VA, Nurmukanova V, Verentsova IV, Girina AA, Protasova IN, Bezbido VS, Sergevnin VI, Feldblum IV, Kudryavtseva LG, Sharafan SN, Semerikov VV, Babushkina ML, Valiullina IR, Chumarev NS, Isaeva GS, Belyanina NA, Shirokova IU, Mrugova TM, Belkova EI, Artemuk SD, Meltser AA, Smirnova MV, Akkonen TN, Golovshchikova NA, Goloshchapov OV, Chukhlovin AB, Popenko LN, Zenevich EY, Vlasov AA, Mitroshina GV, Bordacheva MS, Ageevets IV, Sulian OS, Avdeeva AA, Gostev VV, Tsvetkova IA, Yakunina MA, Vasileva EU, Matsvay AD, Danilov DI, Savochkina YA, Shipulin GA, Sidorenko SV. Shapovalova VV, et al. Antibiotics (Basel). 2024 Dec 17;13(12):1224. doi: 10.3390/antibiotics13121224. Antibiotics (Basel). 2024. PMID: 39766615 Free PMC article.
  • Comparative analysis of Legionella lytica genome identifies specific metabolic traits and virulence factors.
    Koper P, Wysokiński J, Żebracki K, Decewicz P, Dziewit Ł, Kalita M, Palusińska-Szysz M, Mazur A. Koper P, et al. Sci Rep. 2025 Feb 14;15(1):5554. doi: 10.1038/s41598-025-90154-5. Sci Rep. 2025. PMID: 39952999 Free PMC article.

References

    1. Lederberg J. Cell genetics and hereditary symbiosis. Physiol. Rev. 1952;32:403–430. doi: 10.1152/physrev.1952.32.4.403. - DOI - PubMed
    1. Helinski D.R. A Brief History of Plasmids. EcoSal Plus. 2022;10:eESP00282021. doi: 10.1128/ecosalplus.ESP-0028-2021. - DOI - PMC - PubMed
    1. Wright G.D. Environmental and clinical antibiotic resistomes, same only different. Curr. Opin. Microbiol. 2019;51:57–63. doi: 10.1016/j.mib.2019.06.005. - DOI - PubMed
    1. Hughes V.M., Datta N. Conjugative plasmids in bacteria of the ‘pre-antibiotic’ era. Nature. 1983;302:725–726. doi: 10.1038/302725a0. - DOI - PubMed
    1. Berendonk T.U., Manaia C.M., Merlin C., Fatta-Kassinos D., Cytryn E., Walsh F., Burgmann H., Sorum H., Norstrom M., Pons M.N., et al. Tackling antibiotic resistance: The environmental framework. Nat. Rev. Microbiol. 2015;13:310–317. doi: 10.1038/nrmicro3439. - DOI - PubMed

Publication types

Substances

LinkOut - more resources