Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1999 Nov;9(11):1116-27.
doi: 10.1101/gr.9.11.1116.

Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence

Affiliations

Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence

C Médigue et al. Genome Res. 1999 Nov.

Abstract

During the determination of a DNA sequence, the introduction of artifactual frameshifts and/or in-frame stop codons in putative genes can lead to misprediction of gene products. Detection of such errors with a method based on protein similarity matching is only possible when related sequences are available in databases. Here, we present a method to detect frameshift errors in DNA sequences that is based on the intrinsic properties of the coding sequences. It combines the results of two analyses, the search for translational initiation/termination sites and the prediction of coding regions. This method was used to screen the complete Bacillus subtilis genome sequence and the regions flanking putative errors were resequenced for verification. This procedure allowed us to correct the sequence and to analyze in detail the nature of the errors. Interestingly, in several cases in-frame termination codons or frameshifts were not sequencing errors but confirmed to be present in the chromosome, indicating that the genes are either nonfunctional (pseudogenes) or subject to regulatory processes such as programmed translational frameshifts. The method can be used for checking the quality of the sequences produced by any prokaryotic genome sequencing project.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary of the results of resequencing 522 fragments of the B. subtilis chromosome (each fragment is ∼500 bp long). The fragments were pinpointed by the methods described in the text (FSBlastX and ProFED) and were suspected of containing sequencing errors. The bottom line of the graph indicates the total number of errors (substitutions or insertions–deletions) found in the fragments.
Figure 2
Figure 2
B. subtilis prophage 3 region containing three authentic frameshifts (a–c) corresponding to probable pseudogenes (see text). Boundaries of this prophage on the chromosome are indicated by the thick gray line at top. (a–c) Results of the analysis of these regions with the CoDing Sequences searching method (red boxes), the Blast2x method (blue rectangles), and the GeneMark coding predictions (black solid lines). (b) Start and stop codons are represented by pink and green lines, respectively. Atypical features are circled in green (respectively at positions 652000, 656000, 658500, 659000, and 663500 bp).
Figure 3
Figure 3
Example of an authentic frameshift corresponding to a putative programmed frameshift. (a) Representation of the ydhT-trnE B. subtilis region with the Imagene Results Manager. (b) DNA sequence corresponding to the end of the cds? putative gene and the beginning of the ydhU gene. A −1 frameshifting at the UUU UUU slippery sequence will lead to the expression of a gene whose product exhibits some similarities with catalase.
Figure 4
Figure 4
Schema of the FSBlastX method. The method makes use of protein similarity matching (see Methods).
Figure 5
Figure 5
Graphical maps (in the Imagene Result Manager) resulting from the analysis of two B. subtilis chromosomal regions. Results obtained with the CoDing Sequences searching method are shown in gray boxes (CDSs) and gray triangles (RBSs). Those obtained with the GeneMark coding prediction method are displayed as black continuous lines. Results are shown in the three positive frames. (a) Analysis of a fragment of the B. subtilis purine operon. (b) Analysis of a chromosome region containing sequencing errors. Atypical features in the second map are circled in black.
Figure 6
Figure 6
Schema of the ProFED method. The method makes use of intrinsic coding properties of the sequence (Methods section).
Figure 7
Figure 7
Overall strategy of sequencing error analysis. (a) Detection of a DNA region containing a putative sequencing error (BSERR54_ori), extraction of the two flanking regions (pm1, pm2 primers) and PCR resequencing of the fragment (BSERR54_corr). (b) alignment of the BSERR54_ori and BSERR54_corr fragments, and (c) replacement of the erroneous fragment in the B. subtilis chromosome. Here, the correction shows that the ycsA gene is actually longer than thought previously. Another frameshift error, circled in black, was additionally found by FSBlastX (BlastX hits are indicated by black rectangles).

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST : A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Atkins JF, Weiss RB, Thompson S, Gesteland RF. Towards a genetic dissection of the basis of triplet decoding, and its natural subversion: Programmed reading frame shift and hops. Annu Rev Genet. 1991;25:201–228. - PubMed
    1. Atkins JF, Böck A, Matsufuji S, Gesteland RF. Dynamics of the genetic code. In: Gesteland RF, Cech TR, Atkins JF, editors. The RNA world. 2nd edition. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 1999. pp. 637–673.
    1. Blinkowa AL, Walker JR. Programmed ribosomal frameshifting generates the Escherichia coli DNA polymerase III γ subunit from within the θ subunit reading frame. Nucleic Acids Res. 1990;18:1725–1729. - PMC - PubMed

Publication types

LinkOut - more resources