Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007;8(2):R20.
doi: 10.1186/gb-2007-8-2-r20.

Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?

Affiliations
Comparative Study

Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?

Caroline Deshayes et al. Genome Biol. 2007.

Abstract

Background: In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major implications for all subsequent functional characterization steps, including module prediction, comparative genomics and high-throughput proteomic projects.

Results: We show here, using Mycobacterium smegmatis as a model species, that a significant proportion of these ICDSs result from sequencing errors. We used a resequencing procedure and mass spectrometry analysis to determine the nature of a number of ICDSs in this organism. We found that 28 of the 73 ICDSs investigated correspond to sequencing errors.

Conclusion: The correction of these errors results in modification of the predicted amino acid sequences of the corresponding proteins and changes in annotation. We suggest that each bacterial ICDS should be investigated individually, to determine its true status and to ensure that the genome sequence is appropriate for comparative genomics analyses.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Scheme for ICDS detection and resolution strategy. (a) ICDSs are detected within the genome by in silico analysis. The double daggers (‡) indicate the regions containing the identified frameshift. Upon resolution by sequencing and mass spectrometry analysis, the ICDSs can be classified as (b) true frameshifts or (c) sequencing errors. The hash symbol (#) indicates the region of the ORF containing the frameshift. The asterisks (*) indicate sites of corrected sequencing errors resulting in the reconstitution of a full-length ORF. The ORFs are depicted with arrows. The ORF may or may not be in the same frame. Proteins are represented by ellipses.
Figure 2
Figure 2
Comparison of genomic prediction with proteomic results (example of ICDS0040). (a) Representation of the DNA region and its predicted ORFs (in color). (b) Detailed view of the two-dimensional gel. Nano-LC-MS-MS data are obtained after extraction and digestion of the protein. The matching peptides are boxed in the translated genomic sequence (a,c). (c) Representation of the DNA region and its predicted ORF upon correction of the sequencing errors (depicted in the ellipse). Correction of the sequencing errors reassociates the two peptides to give a single protein, accounting for their appearance at a single spot.

References

    1. Bernal A, Ear U, Kyrpides N. Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 2001;29:126–127. doi: 10.1093/nar/29.1.126. - DOI - PMC - PubMed
    1. Lawrence CB, Solovyev VV. Assignment of position-specific error probability to primary DNA sequence data. Nucleic Acids Res. 1994;22:1272–1280. doi: 10.1093/nar/22.7.1272. - DOI - PMC - PubMed
    1. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8:186–194. - PubMed
    1. Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. - PubMed
    1. Perrodou E, Deshayes C, Muller J, Schaeffer C, Van Dorsselaer A, Ripp R, Poch O, Reyrat JM, Lecompte O. ICDS database: interrupted CoDing sequences in prokaryotic genomes. Nucleic Acids Res. 2006;34:D338–343. doi: 10.1093/nar/gkj060. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources