Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;6(10):mgen000398.
doi: 10.1099/mgen.0.000398.

Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores

Affiliations

Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores

Oliver Schwengers et al. Microb Genom. 2020 Oct.

Abstract

Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced Escherichia coli isolates. Platon is available at: http://platon.computational.bio/.

Keywords: whole-genome sequencing; NGS; bacteria; plasmids.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Flowchart describing the workflow implemented in Platon. ORF, open reading frames; MPS, marker protein sequence; RDS, replicon distribution score; SNT, sensitivity threshold; SPT, specificity threshold; incomp. groups, incompatibility groups; CT, conservative threshold.
Fig. 2.
Fig. 2.
Replicon distribution and alignment hit frequencies of marker protein sequences. Shown here are summed plasmid and chromosome alignment hit frequencies per marker protein sequence plotted against plasmid/chromosome hit count ratios scaled to [−1, 1]; Hue: normalized replicon distribution score values (min=−100, max=100), hit count outliers below 10−4 and above 1 are discarded for the sake of readability.
Fig. 3.
Fig. 3.
Evaluation statistics for replicon distribution score thresholds. Sensitivity, specificity and accuracy values are plotted against replicon distribution score threshold ranges. (a) Overview threshold range [−50,10]. (b) Detailed threshold range [−1,1]. Sensitivity is in black, specificity is in brown and accuracy is in blue. Red vertical lines from left to right: sensitivity threshold (−7.7), conservative threshold (0.1) and specificity threshold (0.4).
Fig. 4.
Fig. 4.
Performance benchmark metrics on simulated short-read data. A performance benchmark was conducted on all complete bacterial genomes of the NCBI RefSeq database, assembling simulated short reads and subsequently realigning them onto original genomes. For scaling reasons and the sake of readability, true negatives were discarded. (a) Benchmark results calculated contig-wise. Horizontal red line, total number of true plasmid contigs. (b) Benchmark results calculated nucleotide-wise. Horizontal red line, total number of true plasmid DNA nucleotides.
Fig. 5.
Fig. 5.
Taxonomic distribution of recruited plasmid contigs. The taxonomic distribution of the recruited plasmid contigs for the simulated benchmark dataset is shown binned to the genus level. Taxa accounting for less than 2 % are grouped as ‘others’. (a) PlasFlow; (b) Platon.
Fig. 6.
Fig. 6.
Performance benchmark metrics on real short-read data. A performance benchmark was conducted on 21 E. coli genomes, for which both short-read draft assemblies and complete genomes via hybrid assemblies were available. For scaling reasons and the sake of readability, true negatives were discarded. (a) Benchmark results calculated contig-wise. Horizontal red line, total number of true plasmid contigs. (b) Benchmark results calculated nucleotide-wise. Horizontal red line, total number of true plasmid DNA nucleotides.

References

    1. Clark DP, Stahl DA, Martinko JM, Madigan MT. Brock biology of microorganisms (13th edition). Benjamin Cummings. 2010 https://www.amazon.com/Brock-Biology-Microorganisms-Michael-Madigan/dp/0...
    1. Tazzyman SJ, Bonhoeffer S. Why there are no essential genes on plasmids. Mol Biol Evol. 2015;32:3079–3088. doi: 10.1093/molbev/msu293. - DOI - PubMed
    1. Thomas CM, Nielsen KM. Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol. 2005;3:711–721. doi: 10.1038/nrmicro1234. - DOI - PubMed
    1. Smillie C, Garcillán-Barcia MP, Francia MV, Rocha EPC, de la Cruz F. Mobility of plasmids. Microbiol Mol Biol Rev. 2010;74:434–452. doi: 10.1128/MMBR.00020-10. - DOI - PMC - PubMed
    1. Carattoli A. Plasmids and the spread of resistance. Int J Med Microbiol. 2013;303:298–304. doi: 10.1016/j.ijmm.2013.02.001. - DOI - PubMed

Publication types

MeSH terms

Substances