Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 18;29(10):915-21.
doi: 10.1038/nbt.1966.

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets

Affiliations

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets

Hamidreza Chitsaz et al. Nat Biotechnol. .

Abstract

Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Assembling single cell reads using Velvet-SC. (a) Coverage varies widely along the genome, between 1 and 12 in this cartoon example. Reads (short lines) and potential contigs (thick lines; boxes around the supporting reads) are positioned along the genome, with a box around the reads supporting each contig. There are two potential contigs to choose from in the middle, differing by a single nucleotide (C vs. T): a green contig with coverage 6.4, and a blue contig with coverage 1. With a fixed coverage threshold of 4, Velvet would delete the low coverage blue and purple contigs, and then merge the high coverage red and green contigs into a contig much shorter than the full genome. Velvet-SC instead starts by eliminating sequences of average coverage 1, which only removes the blue contig. The other contigs are combined into a single contig (b) of average coverage 9. The purple region is salvaged by Velvet-SC because it was absorbed into a higher coverage region coverage threshold increased. Velvet-SC repeats this process with a gradually increasing low coverage threshold. (c) A portion of the de Bruijn graph for the contigs described in (a). The black circles are the “vertices” and represent 5-mer strings derived from the reads, which are indicated by colored lines alongside the chains of vertices, including a blue read with an erroneous T. The lines between the vertices are termed “edges” and represent the overlaps between the 5-mers. The edges are directed from left to right in this example. The read with the C/T mismatch results in two alternative paths for assembly, both with 5 intermediate vertices. The lower of the two paths arises from the erroneous blue read and has coverage 1; it is the only part of the graph eliminated by Velvet-SC, leaving a single chain of vertices that gives a single contig for the entire genome. See Supplementary Figure S3 for an example of the condensing of contigs. An example of Velvet-SC handling of a chimeric read is presented in Supplementary Figure S4.
Figure 2
Figure 2
Comparison of contigs generated by Velvet vs. EULER+Velvet-SC for single cell E. coli lane 1. (a,b,c) Contigs are those presented in Table 1 and are ordered from largest to smallest number of bases. The y-axis shows (a) the cumulative length, (b) the cumulative number of genes, and (c) the cumulative number of operons in the contigs. EULER+Velvet-SC improves upon Velvet in all three plots. (d) Average read coverage over a 1000 bp window (top, log scale), Velvet contigs (middle) and EULER+Velvet-SC contigs (bottom), mapped along the E. coli reference genome, with vertical staggering to help visualize small contigs. Contigs in blue or green match between the assemblies. Contigs in red or orange differ between the assemblies: they either have substantially different lengths, are broken into a different number of contigs, or are present in one assembly but missing in the other.
Figure 3
Figure 3
A 16S maximum likelihood tree of Deltaproteobacterial 16S sequences including SAR324_MDA (red). Sequences with species identification are from representative Deltaproteobacterial reference genomes in GenBank. The environmental 16S sequences (designated uncultured SAR324 or uncultured deltaproteobacteria) were retrieved from GenBank based on their accession numbers (see Fig. S3 of ). The sequences were aligned using MOTHUR . The tree was inferred using the nucleotide maximum likelihood feature of PAUP* 4.0b10 . Branches drawn in thick lines are clades with bootstrap support of 75% or greater. Sequences present on fosmids with extensive nucleotide similarity to the SAR324_MDA assembly are indicated (red star), as is a SAR324 fosmid (yellow star) encoding CoxL homologs also present in the SAR324_MDA assembly (see Supplementary Fig. S13).

Comment in

  • Picking up the pieces.
    Eisenstein M. Eisenstein M. Nat Methods. 2011 Nov;8(11):896-7. doi: 10.1038/nmeth.1753. Nat Methods. 2011. PMID: 22148156 No abstract available.

References

    1. Rusch DB, et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007;13(3):e77. 5. - PMC - PubMed
    1. Gill SR, et al. Metagenomic analysis of the human distal gut microbiome. Science. 2006;312:1355–1359. - PMC - PubMed
    1. Raghunathan A, et al. Genomic DNA amplification from a single bacterium. Appl. Environ. Microbiol. 2005;71:3342–3347. - PMC - PubMed
    1. Dean FB, et al. Comprehensive human genome amplification using multiple displacement amplification. Proc. Natl. Acad. Sci. USA. 2002;99:5261–5266. - PMC - PubMed
    1. Dean FB, Nelson JR, Giesler TL, Lasken RS. Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 2001;11:1095–1099. - PMC - PubMed

Publication types