Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar;6(2):150-6.
doi: 10.1111/1751-7915.12006. Epub 2012 Dec 2.

Tips and tricks for the assembly of a Corynebacterium pseudotuberculosis genome using a semiconductor sequencer

Affiliations

Tips and tricks for the assembly of a Corynebacterium pseudotuberculosis genome using a semiconductor sequencer

Rommel Thiago Jucá Ramos et al. Microb Biotechnol. 2013 Mar.

Abstract

New sequencing platforms have enabled rapid decoding of complete prokaryotic genomes at relatively low cost. The Ion Torrent platform is an example of these technologies, characterized by lower coverage, generating challenges for the genome assembly. One particular problem is the lack of genomes that enable reference-based assembly, such as the one used in the present study, Corynebacterium pseudotuberculosis biovar equi, which causes high economic losses in the US equine industry. The quality treatment strategy incorporated into the assembly pipeline enabled a 16-fold greater use of the sequencing data obtained compared with traditional quality filter approaches. Data preprocessing prior to the de novo assembly enabled the use of known methodologies in the next-generation sequencing data assembly. Moreover, manual curation was proved to be essential for ensuring a quality assembly, which was validated by comparative genomics with other species of the genus Corynebacterium. The present study presents a modus operandi that enables a greater and better use of data obtained from semiconductor sequencing for obtaining the complete genome from a prokaryotic microorganism, C. pseudotuberculosis, which is not a traditional biological model such as Escherichia coli.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflow representing the assembling process and each step for the generation of a consensus sequence along with their receptive software/methods. The assembling process consists of: (A) data treatment, where the reads are trimmed and removed when the mean quality value on the region does not reach the cut-off value; (B) the filtered reads are then fragmented into short reads with the same size to be submitted to de novo assembly; (C) de novo assembly using diverse parameters and softwares; (D) removal of redundant sequences by the Simplifier software; (E) extending of sequences with similar extremities using the software G4ALL and the genome of a related species as reference; and (F) recursive analyses based on short-read alignments, against a preliminary scaffold, using CLC, gap identification using an in-house script and manual curation of gaps/frameshifts.
Figure 2
Figure 2
Genome map of C. pseudotuberculosis 316 and synteny map between the genome sequences of Corynebacterium pseudotuberculosis strains 1002 and 316.A. Genome map of Corynebacterium pseudotuberculosis 316 showing common features. CDS (coding sequence); tRNA (transporter RNA); rRNA (ribosomal rRNA).B. Analysis of genome synteny shows two big deletions on C. pseudotuberculosis strain 316 when compared with the C. pseudotuberculosis strain 1002. Both cases, regions A and B, can be explained through the presence of two pathogenicity islands, PICPs 4 and 5 respectively.
Figure 3
Figure 3
Genomic map comparing strains of Corynebacterium pseudotuberculosis, Corynebacterium ulcerans and Corynebacterium diphtheriae.A. Comparative genomic analyses between: Corynebacterium pseudotuberculosis strains 1002, C231, CIP52.97 and 316; Corynebacterium ulcerans strains BR-AD22 and 809; Corynebacterium diphtheriae NCTC 13129; Corynebacterium glutamicum ATCC 13032; and pathogenicity islands identified in C. pseudotuberculosis. The figure shows the presence/absence of the pathogenicity islands of C. pseudotuberculosis 1002, strain which was also used as reference to create the figure, on the other strains and species.B. Graphical representation of the PAIs 4, 5, 8, 9, 10 and 11 between C. pseudotuberculosis 1002 and 316.
Figure 4
Figure 4
Procedure for the identification of high-quality regions inside Ion Torrent-generated sequences through the search of seeds with medium Phred quality of 20. The long-read version of the Quality Assessment software starts reading the sequence from the first base, using a user-defined window size (31 bp, for example), and walks through the sequence, base by base, until it reaches a region with a mean Phred quality value of 20. After this, the software extends the seed until it reaches either the end of the sequence or a region with a mean Phred quality value lower than 20.

Similar articles

Cited by

References

    1. Alikhan NF, Petty NK, Ben Zakour NL, Beatson SA. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics. 2011;12:402. - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Carver TJ, Rutherford KM, Berriman M, Rajandream M-A, Barrell BG, Parkhill J. ACT: the Artemis Comparison Tool. Bioinformatics. 2005;21:3422–3423. - PubMed
    1. Cerdeira LT, Carneiro AR, Ramos RTJ, de Almeida SS, D'Afonseca V, Schneider MPC, et al. Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study. J Microbiol Methods. 2011;86:218–223. - PubMed
    1. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999;27:4636–4641. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources