Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 22;50(3):e17.
doi: 10.1093/nar/gkab1115.

Deeplasmid: deep learning accurately separates plasmids from bacterial chromosomes

Affiliations

Deeplasmid: deep learning accurately separates plasmids from bacterial chromosomes

William B Andreopoulos et al. Nucleic Acids Res. .

Abstract

Plasmids are mobile genetic elements that play a key role in microbial ecology and evolution by mediating horizontal transfer of important genes, such as antimicrobial resistance genes. Many microbial genomes have been sequenced by short read sequencers and have resulted in a mix of contigs that derive from plasmids or chromosomes. New tools that accurately identify plasmids are needed to elucidate new plasmid-borne genes of high biological importance. We have developed Deeplasmid, a deep learning tool for distinguishing plasmids from bacterial chromosomes based on the DNA sequence and its encoded biological data. It requires as input only assembled sequences generated by any sequencing platform and assembly algorithm and its runtime scales linearly with the number of assembled sequences. Deeplasmid achieves an AUC-ROC of over 89%, and it was more accurate than five other plasmid classification methods. Finally, as a proof of concept, we used Deeplasmid to predict new plasmids in the fish pathogen Yersinia ruckeri ATCC 29473 that has no annotated plasmids. Deeplasmid predicted with high reliability that a long assembled contig is part of a plasmid. Using long read sequencing we indeed validated the existence of a 102 kb long plasmid, demonstrating Deeplasmid's ability to detect novel plasmids.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Deeplasmid training, validation and testing. The plasmid and chromosome dataset was split into six segments, of which five were used in training a model. The sixth segment was used for validation of the trained model. We repeated over the training twice to derive 12 different models. Using 12 models allows reducing the effects of random variance in the predictions.
Figure 2.
Figure 2.
(A) Training convergence through the epochs. Loss and accuracy are shown as a function of epochs. (B) Training was performed 12 times on the plasmid-chromosome (ACLAME + PLSDB + refseq) dataset to derive 12 k-fold models (two per validation segment). All k-Fold models achieved an accuracy (AUC) on the validation segment of over 0.98 with a small statistical variance in the prediction accuracy. (C) The ROC-AUC curve (TPs versus FPs) on the IMG test dataset with 3280 scaffolds of length 1k–330k bases is 0.8985.
Figure 3.
Figure 3.
Evaluation on the IMG test dataset. The class separation is clear based on a threshold of 0.5. The percent of classifications above the threshold that are plasmids is 94.45% (precision). The percent of all plasmids classified above the threshold is 75.56% (recall). The percent of all chromosomes that are classified below the threshold (specificity) is 94.46%.
Figure 4.
Figure 4.
Deeplasmid validation. Yersinia ruckeri ATCC 29473 was sequenced with Oxford Nanopore MinION and assembled with Canu and with Shasta. (A) The assembled contigs included a circular piece of DNA that shares 92.78% identity with a known Yersinia plasmid (pYR3; Genbank: LN681230.1). Two linear scaffolds of this genome were predicted by Deeplasmid to be from plasmids (shades of red), and indeed they align with the newly-found plasmid. (B) The assembled contigs also contained a chromosome. Most of the linear scaffolds for this genome did not undergo Deeplasmid prediction, due to their large size (>330kb; dark navy blue). However, those within the size range were largely predicted to be chromosomal in origin (lighter shades of blue). Scaffold 103* is a subsequence of a larger IMG scaffold that matched twice to the assembled chromosome; this short region was predicted by Deeplasmid here (see Materials and Methods). (C) Validated plasmid gene functions. Annotations show genes that are classically associated with plasmids. Color scheme is indicated in the center; grey = ‘hypothetical protein’; tan = other functions.

References

    1. Tran J.H., Jacoby G.A.. Mechanism of plasmid-mediated quinolone resistance. Proc. Natl. Acad. Sci. U.S.A. 2002; 99:5638–5642. - PMC - PubMed
    1. Martínez-Martínez L., Pascual A., Jacoby G.A.. Quinolone resistance from a transferable plasmid. Lancet. 1998; 351:797–799. - PubMed
    1. Klaenhammer T.R. Plasmid-directed mechanisms for bacteriophage defense in lactic streptococci. FEMS Microbiol. Rev. 1987; 3:313–325.
    1. Sing W.D., Klaenhammer T.R.. Characteristics of phage abortion conferred in lactococci by the conjugal plasmid pTR2030. Microbiology. 1990; 136:1807–1815.
    1. Silver S., Misra T.K.. Plasmid-mediated heavy metal resistances. Annu. Rev. Microbiol. 1988; 42:717–743. - PubMed

Publication types