Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 15;33(4):475-482.
doi: 10.1093/bioinformatics/btw651.

Recycler: an algorithm for detecting plasmids from de novo assembly graphs

Affiliations

Recycler: an algorithm for detecting plasmids from de novo assembly graphs

Roye Rozov et al. Bioinformatics. .

Abstract

Motivation: Plasmids and other mobile elements are central contributors to microbial evolution and genome innovation. Recently, they have been found to have important roles in antibiotic resistance and in affecting production of metabolites used in industrial and agricultural applications. However, their characterization through deep sequencing remains challenging, in spite of rapid drops in cost and throughput increases for sequencing. Here, we attempt to ameliorate this situation by introducing a new circular element assembly algorithm, leveraging assembly graphs provided by a conventional de novo assembler and alignments of paired-end reads to assemble cyclic sequences likely to be plasmids, phages and other circular elements.

Results: We introduce Recycler, the first tool that can extract complete circular contigs from sequence data of isolate microbial genomes, plasmidome and metagenome sequence data. We show that Recycler greatly increases the number of true plasmids recovered relative to other approaches while remaining highly accurate. We demonstrate this trend via simulations of plasmidomes, comparisons of predictions with reference data for isolate samples, and assessments of annotation accuracy on metagenome data. In addition, we provide validation by DNA amplification of 77 plasmids predicted by Recycler from the different sequenced samples in which Recycler showed mean accuracy of 89% across all data types-isolate, microbiome and plasmidome.

Availability and implementation: Recycler is available at http://github.com/Shamir-Lab/Recycler.

Contact: imizrahi@bgu.ac.il.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Recycler work-flow. An example is shown of generating candidate cycles and peeling off cycles iteratively. For simplicity, all lengths are assumed to be equal and not shown. Here, we consider only candidate cycles that pass through vertex x, but ordinarily such candidates would be generated for each vertex in the component, and the cycle with lowest CV will be chosen and peeled off. (A) The assembly graph. (B) A single component is selected from the assembly graph (framed in A) and represented with vertices for contigs and edges for connecting k-mers. (C) The reduced component after tip removal. The numbers next to vertices are their observed contig coverage. Since vertex x has two incoming edges from vertices b and c, two candidate cycles are generated that pass through edges (b, x) and (c, x), respectively. This is done by computing shortest paths from x to b(x,e,d,g,h,i,j,b,CV=0.20,showninD) and from x to c(x,e,d,g,h,c,CV=0.41,notshown). Two successive steps of peeling cycles are shown with their respective latent coverage assignments. First, the cycle in D is peeled off because the CV calculated from initially observed coverage is lowest for this cycle. Uncolored vertices correspond to contigs with zero coverage that are removed
Fig. 2
Fig. 2
Methods performance on simulated data. Results are shown for SPAdes without repeat resolution (RR), SPAdes with repeat resolution, the method of Jørgensen et al., and Recycler. The contigs of SPAdes before RR were used as input for the three other methods. Recycler also relied on the graph produced at this stage. F1 score calculation is described in the main text. The x axis shows the number of simulated reference sequences in each case
Fig. 3.
Fig. 3.
PCR based validation of Recycler’s plasmid predictions. High coverage: 60–1000x, med–high:15–60x, med–low: 5–15x, low: 1–5x

References

    1. Antipov D. et al. (2016). plasmidSPAdes: Assembling Plasmids from Whole Genome Sequencing Data. Technical report. - PubMed
    1. Bankevich A. et al. (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477. - PMC - PubMed
    1. Bevan M.W., Flavell R.B., Chilton M.D. (1983) A chimaeric antibiotic resistance gene as a selectable marker for plant cell transformation. Nature, 304, 184–187. - PubMed
    1. Brown Kav A. et al. (2012) Insights into the bovine rumen plasmidome. Proc. Natl. Acad. Sci. USA, 109, 5452–5457. - PMC - PubMed
    1. Brown Kav A. et al. (2013) A method for purifying high quality and high yield plasmid DNA for metagenomic and deep sequencing approaches. J. Microbiol. Methods, 95, 272–279. - PubMed