Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Feb 25;6(2):e16717.
doi: 10.1371/journal.pone.0016717.

Using genomic sequencing for classical genetics in E. coli K12

Affiliations

Using genomic sequencing for classical genetics in E. coli K12

Eric Lyons et al. PLoS One. .

Abstract

We here develop computational methods to facilitate use of 454 whole genome shotgun sequencing to identify mutations in Escherichia coli K12. We had Roche sequence eight related strains derived as spontaneous mutants in a background without a whole genome sequence. They provided difference tables based on assembling each genome to reference strain E. coli MG1655 (NC_000913). Due to the evolutionary distance to MG1655, these contained a large number of both false negatives and positives. By manual analysis of the dataset, we detected all the known mutations (24 at nine locations) and identified and genetically confirmed new mutations necessary and sufficient for the phenotypes we had selected in four strains. We then had Roche assemble contigs de novo, which we further assembled to full-length pseudomolecules based on synteny with MG1655. This hybrid method facilitated detection of insertion mutations and allowed annotation from MG1655. After removing one genome with less than the optimal 20- to 30-fold sequence coverage, we identified 544 putative polymorphisms that included all of the known and selected mutations apart from insertions. Finally, we detected seven new mutations in a total of only 41 candidates by comparing single genomes to composite data for the remaining six and using a ranking system to penalize homopolymer sequencing and misassembly errors. An additional benefit of the analysis is a table of differences between MG1655 and a physiologically robust E. coli wild-type strain NCM3722. Both projects were greatly facilitated by use of comparative genomics tools in the CoGe software package (http://genomevolution.org/).

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Computational analysis of sequencing data.
Hexagons represent initial data sets and final outputs; ovals represent algorithms and other operations; rounded boxes represent data transformations. Note that Mauve produces alignments of multiple genomes and that the logic for construction of a composite sequence is internal to PolyMFind during polymorphism detection. The net effect of these two programs is the comparison of one genome to a composite for the identification of unique polymorphisms.
Figure 2
Figure 2. Number of contigs and polymorphisms of various classes and subclasses (y-axis) vs. fold sequence coverage (x-axis).
See Table S1.
Figure 3
Figure 3. Syntenic dotplots of de novo assembled contigs of NCM4370 (x-axis) to fully sequenced and assembled reference genome MG1655.
Vertical black lines separate contigs from NCM4370. Gray dots are putative homologous gene pairs. Green dots (which form lines) are collinear sets of homologous gene pairs used to infer synteny. (A) NCM4370 contigs are ordered by size with largest on the left (http://genomevolution.org/r/bjz). The largest contigs are in the region of the terminus of replication, which is known to contain fewer repetitive elements than other regions of the E. coli genome . (B) NCM4370 contigs are ordered in the best syntenic path by comparison to reference genome MG1655. Individual contigs may be inverted to ensure that the syntenic path is conserved. Discontinuities in syntenic line are the result of deletions and insertions. The red arrow marks the position of a lambda prophage in NCM4370 (see text). Results can be regenerated at: http://genomevolution.org/r/bjy.
Figure 4
Figure 4. A series of syntenic dotplots between the NCM strains and the reference genome MG1655.
Scaffolds of the NCM strains are ordered by their syntenic path to MG1655. Vertical black lines are divisions between contigs and green diagonal lines are syntenic gene pairs. Red arrows show an additional contig break in NCM4139 and NCM4384 caused by a new insertion of IS186 in the promoter for the lon gene. The extra breaks in strain NCM4781, which were due to insufficient sequence coverage, are immediately apparent.
Figure 5
Figure 5. Percent homopolymer sequencing error versus homopolymer length with exponential regression.
Data are plotted for the seven strains with highest sequence coverage (see Table S2).
Figure 6
Figure 6. Number of polymorphisms as a function of false positive score.
The number of putative polymorphisms was determined for all eight strains and for the seven strains with highest sequence coverage. The number of known and confirmed new mutations in the seven strains was 30 and all had false positive scores ≤5.

References

    1. Beckwith J. Genetic suppressors and recovery of repressed biochemical memory. J Biol Chem. 2009;284:12585–12592. - PMC - PubMed
    1. O'Connor K, Fletcher SA, Csonka LN. Increased expression of Mg2+ transport proteins enhances the survival of Salmonella enterica at high temperature. Proc Natl Acad Sci U S A. 2009;106:17522–17527. - PMC - PubMed
    1. Kim K-S, Pelton JG, Inwood WB, Andersen U, Kustu S, et al. The Rut pathway for pyrimidine degradation: novel chemistry and toxicity problems. J Bacteriol. 2010;192:4089–4102. - PMC - PubMed
    1. Soupene E, van Heeswijk WC, Plumbridge J, Stewart V, Bertenthal D, et al. Physiological studies of Escherichia coli strain MG1655: growth defects and apparent cross-regulation of gene expression. J Bacteriol. 2003;185:5611–5626. - PMC - PubMed
    1. Inwood WB, Hall JA, Kim K-S, Demirkhanyan L, Wemmer D, et al. Epistatic effects of the protease/chaperone HflB on some damaged forms of the Escherichia coli ammonium channel AmtB. Genetics. 2009;183:1327–1340. - PMC - PubMed

Publication types