Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 16:11:243.
doi: 10.1186/1471-2164-11-243.

Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

Affiliations

Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

Yukari Nishito et al. BMC Genomics. .

Abstract

Background: Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length.

Results: We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for gamma-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases.

Conclusions: The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sorting scaffolds along Marburg 168 reference genome. The de novo assembled scaffolds were sorted using anchors along the strain Marburg 168 genome and aligned to the Marburg 168 reference genome. Anchors, which are short well-conserved subsequences between each scaffold and the reference genome, were calculated using Murasaki, a multiple genome comparison program [16]. (Left, before sorting) Link plot between unsorted scaffolds and the Marburg 168 reference genome. (Right, after sorting) Link plot between sorted scaffolds and the reference genome. Each line between the scaffolds (upper part of the link plot) and the reference genome (lower) indicates an anchor between them.
Figure 2
Figure 2
Concatenating sorted scaffolds to constitute scaffolds. The de novo assembled scaffolds were combined with the reference-guided draft to fill the gaps among the scaffolds. (Left) Two adjacent scaffolds that overlapped were merged into one larger scaffold. (Right) If a subsequence was inserted in the reference-guided draft between two adjacent scaffolds, such scaffolds and the inserted subsequence were concatenated into one scaffold.
Figure 3
Figure 3
SfiI physical maps. Comparison of SfiI physical (restriction site) maps between our draft and the experimental construction [18]. Each block with a different graded gray color indicates an SfiI fragment which must be digested at SfiI restriction site. The number of SfiI restriction sites in both maps is identical and SfiI fragments are similar in sizes within experimental errors, therefore proving the accuracy of our draft.
Figure 4
Figure 4
Analyses of γ-PGA production genes for soybean fermentation. We confirmed the two nucleotide changes in regions related to γ-PGA production. In the 168 strain, incapable of producing γ-PGA, a single nucleotide is substituted from cytosine to thymine in the promoter region of degQ and a single adenine is inserted into the coding region of swrAA. These two nucleotide substitutions from 168 strain are specifically present in the natto BEST195 genome. (Left) Alignment between the swrAA coding region of natto BEST195 and Marburg 168. The box indicates a single adenine nucleotide insertion position. The pair " [->" and "<-]" indicates the region of the first pseudogene annotation, the pair " [= >" and "< = ]" indicates the region of the second pseudogene annotation in 168 strain, and the pair " [+>" and "<+]" indicates the ORF of swrAA coding region in natto BEST195. (Right) Alignment of the degQ promoter regions from strains BEST195 and 168. The box indicates the thymine-to-cytosine nucleotide substitution, "[->" indicates the transcription start site, and "[= >" denotes the translation start codon ATG.
Figure 5
Figure 5
Analyses of quorum-sensing genes for soybean fermentation. Large variations in the four quorum-sensing genes comQ, comX, comP, and comA were observed between natto BEST195 and 168. The partial broad arrow in strain 168 indicates the identical portion of DNA sequences encoding each gene of the natto strain. The percentages indicate identity of protein sequences between natto BEST195 and 168.
Figure 6
Figure 6
Polyketide synthesis operon. An operon structure for a series of polyketide synthesis genes from pksB to pksR which begins with transcriptional regulator pksA and ends with hydroxylase of polyketide pksS is completely deleted in B. subtilis natto BEST195. Link plot of the alignment between a region including the polyketide synthesis operon in Marburg 168 and the corresponding region in BEST195 is displayed. The alignment was calculated using Murasaki, a multiple genome comparison program [16]. Each line between them indicates an anchor, which is a short well-conserved subsequence. Four PCR primers, A: 5'-AGAAAACAAATTGCAGAAGCAAC-3', B: 5'-GCATGTTGTTAAAGCACATAGCA-3', C: 5'-GATTGCATATGAAGTCACTCGC-3', and D: 5'-TACTCTACTCAGGTTGAGTGGGC-3' are indicated by horizontal short arrows. These primers are designed to amplify both ends of polyketide synthesis operon in 168. The pair of primers A, B and the pair C, D produced the predicted 3.14-kb and 3.10-kb fragments from 168. In contrast, only the pair A, D produced a predicted 1.62-kb fragment from BEST195 (data not shown).
Figure 7
Figure 7
Analyses of scaffold ends that the Velvet assembler generated. Annotations for both ends of all scaffolds of length greater than 1 kbp revealed that de novo assembly by the Velvet assembler terminated at repeat sequences such as tRNAs, ISs, and phages.

References

    1. Dohm J, Lottaz C, Borodina T, Himmelbauer H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 2007;17:1697–1706. doi: 10.1101/gr.6435207. - DOI - PMC - PubMed
    1. Hernandez D, Francois P, Farinelli L, Osteras M, Schrenzel J. De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. Genome Res. 2008;18:802–809. doi: 10.1101/gr.072033.107. - DOI - PMC - PubMed
    1. Srivatsan A, Han Y, Peng J, Tehranchi A, Gibbs R, Wang J, Chen R. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies. PLoS Genet. 2008;4:e1000139. doi: 10.1371/journal.pgen.1000139. - DOI - PMC - PubMed
    1. Barbe V, Cruveiller S, Kunst F, Lenoble P, Meurice G, Sekowska A, Vallenet D, Wang T, Moszer I, Medigue C, Danchin A. From a consortium sequence to a unified sequence: the Bacillus subtilis 168 reference genome a decade later. Microbiology. 2009;155:1758–1775. doi: 10.1099/mic.0.027839-0. - DOI - PMC - PubMed
    1. Pop M, Salzberg S. Bioinformatics challenges of new sequencing technology. Trends Genet. 2008;24:142–149. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources