Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;6(10):mgen000435.
doi: 10.1099/mgen.0.000435.

Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance

Affiliations

Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance

James Robertson et al. Microb Genom. 2020 Oct.

Abstract

Bacterial plasmids play a large role in allowing bacteria to adapt to changing environments and can pose a significant risk to human health if they confer virulence and antimicrobial resistance (AMR). Plasmids differ significantly in the taxonomic breadth of host bacteria in which they can successfully replicate, this is commonly referred to as 'host range' and is usually described in qualitative terms of 'narrow' or 'broad'. Understanding the host range potential of plasmids is of great interest due to their ability to disseminate traits such as AMR through bacterial populations and into human pathogens. We developed the MOB-suite to facilitate characterization of plasmids and introduced a whole-sequence-based classification system based on clustering complete plasmid sequences using Mash distances (https://github.com/phac-nml/mob-suite). We updated the MOB-suite database from 12 091 to 23 671 complete sequences, representing 17 779 unique plasmids. With advances in new algorithms for rapidly calculating average nucleotide identity (ANI), we compared clustering characteristics using two different distance measures - Mash and ANI - and three clustering algorithms on the unique set of plasmids. The plasmid nomenclature is designed to group highly similar plasmids together that are unlikely to have multiple representatives within a single cell. Based on our results, we determined that clusters generated using Mash and complete-linkage clustering at a Mash distance of 0.06 resulted in highly homogeneous clusters while maintaining cluster size. The taxonomic distribution of plasmid biomarker sequences for replication and relaxase typing, in combination with MOB-suite whole-sequence-based clusters have been examined in detail for all high-quality publicly available plasmid sequences. We have incorporated prediction of plasmid replication host range into the MOB-suite based on observed distributions of these sequence features in combination with known plasmid hosts from the literature. Host range is reported as the highest taxonomic rank that covers all of the plasmids which share replicon or relaxase biomarkers or belong to the same MOB-suite cluster code. Reporting host range based on these criteria allows for comparisons of host range between studies and provides information for plasmid surveillance.

Keywords: bacterial genomes; mobile genetic elements; plasmid host range; plasmids; relaxase typing; replicon typing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
A simplified example of the host-range prediction feature implemented within MOB-typer. Host-range prediction uses replicon type, relaxase biomarker accession number and MOB-cluster to individually query a literature database of publications associated with plasmids and the MOB-suite plasmid database. The taxonomy associated with each of the records is aggregated and placed into a taxonomic hierarchy. The hierarchy is then processed to identify the point of taxonomic convergence, the lowest taxonomic rank that is parent to all of the taxa involved. Both the literature host range and the plasmid database convergence ranks are reported to the user.
Fig. 2.
Fig. 2.
Violin plot of the Mash distances between complete and draft versions of plasmid and chromosomes.
Fig. 3.
Fig. 3.
Replicon-typed plasmids were clustered using either ANI- or Mash-based distances using complete-, single- or average-linkage algorithms. The mean Shannon entropy and mean number of types is based on the number of replicon types present in each of the clusters. The lower bound derived from the closed genome analysis is highlighted by the vertical red dotted line.
Fig. 4.
Fig. 4.
Plasmids that were typed according to the existing relaxase accession numbers were clustered using either ANI- or MASH-based distances using complete-, single- or average-linkage algorithms. The mean Shannon entropy and mean number of types is based on the number of relaxase accession numbers present in each of the clusters. The lower bound derived from the closed genome analysis is highlighted by the vertical red dotted line.
Fig. 5.
Fig. 5.
Performance scores across different distance thresholds of either Mash and ANI distance measures using three different clustering approaches: complete-, single- and average-linkage. Performance scores are the result of combining mean cluster size along with cluster entropy and the mean number of types (replicon or relaxase) within a cluster (Equation 3). The lower bound of the clustering threshold determined by the closed genomes experiments is signified by the vertical red dotted line.
Fig. 6.
Fig. 6.
Stacked bar chart of the highest point of taxonomic convergence for plasmids based on replicon types, relaxase accession numbers and MOB-clusters. An overall convergence was determined using all of the features applicable to a given plasmid and picking the highest convergence point achieved.

References

    1. Shintani M, Sanchez ZK, Kimbara K. Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy. Front Microbiol. 2015;6:242. doi: 10.3389/fmicb.2015.00242. - DOI - PMC - PubMed
    1. Rozwandowicz M, Brouwer MSM, Fischer J, Wagenaar JA, Gonzalez-Zorn B, et al. Plasmids carrying antimicrobial resistance genes in Enterobacteriaceae. J Antimicrob Chemother. 2018;73:1121–1137. doi: 10.1093/jac/dkx488. - DOI - PubMed
    1. Couturier M, Bex F, Bergquist PL, Maas WK. Identification and classification of bacterial plasmids. Microbiol Rev. 1988;52:375–395. doi: 10.1128/MMBR.52.3.375-395.1988. - DOI - PMC - PubMed
    1. Garcillán-Barcia MP, Alvarado A, de la Cruz F. Identification of bacterial plasmids based on mobility and plasmid population biology. FEMS Microbiol Rev. 2011;35:936–956. doi: 10.1111/j.1574-6976.2011.00291.x. - DOI - PubMed
    1. Baker S, Hardy J, Sanderson KE, Quail M, Goodhead I, et al. A novel linear plasmid mediates flagellar variation in Salmonella Typhi. PLoS Pathog. 2007;3:e59. doi: 10.1371/journal.ppat.0030059. - DOI - PMC - PubMed

Publication types