Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 30:14:289.
doi: 10.1186/1471-2164-14-289.

ContigScape: a Cytoscape plugin facilitating microbial genome gap closing

Affiliations

ContigScape: a Cytoscape plugin facilitating microbial genome gap closing

Biao Tang et al. BMC Genomics. .

Abstract

Background: With the emergence of next-generation sequencing, the availability of prokaryotic genome sequences is expanding rapidly. A total of 5,276 genomes have been released since 2008, yet only 1,692 genomes were complete. The final phase of microbial genome sequencing, particularly gap closing, is frequently the rate-limiting step either because of complex genomic structures that cause sequence bias even with high genomic coverage, or the presence of repeat sequences that may cause gaps in assembly.

Results: We have developed a Cytoscape plugin to facilitate gap closing for high-throughput sequencing data from microbial genomes. This plugin is capable of interactively displaying the relationships among genomic contigs derived from various sequencing formats. The sequence contigs of plasmids and special repeats (IS elements, ribosomal RNAs, terminal repeats, etc.) can be displayed as well.

Conclusions: Displaying relationships between contigs using graphs in Cytoscape rather than tables provides a more straightforward visual representation. This will facilitate a faster and more precise determination of the linkages among contigs and greatly improve the efficiency of gap closing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A sample genome displayed by ContigScape. A, the ContigScape interface. The left portion of the screen represents the control panel. The window on the right shows a sample genome. Contigs are colored red (repeat contig), dark blue (unique contig) and orange (probable repeats). B, a zoomed image of some contigs. Some contigs (light blue frame in A) were zoomed to present an enlarged image. B1. A linear plasmid formed by three contigs. B2. Repeats (Contig28) at the end of the chromosome. B3. A circular plasmid with high copy number formed by three contigs (Contig141, 142, and 143). B4. Two high-copy-number circular plasmids each formed by a single contig. B5. A linear plasmid with high copy number was formed by one contig. B6. A circular plasmid with single copy number composed of one contig.
Figure 2
Figure 2
A 913-bp repeat contig assembled with 454 reads. A. The 5 prime end of the contig, independently assembled by the reads from Contig64 and Contig63. B. The3 prime end of the contig, assembled as described in panel A, but reads extended into Contig52 and Contig76. C. The list of read names from the “ace” file. D. The list of reads whose names contained ‘fm’ or ‘to’, which linked to the unique and repeat contigs, respectively.
Figure 3
Figure 3
Schematic diagram of assembly from 454 reads and the relationship of repeat contigs. A. The genome has four unique sections (1–4) and two repeats (R1 and R2). B. One repeat contig and four unique contigs were assembled. The reads coming from R1 and R2 was assembled into the same contig, resulting in twice the coverage of other contigs. Some reads at the end of the repeat contig consisted of only partial sequences, and the other parts of the reads are located in other contigs. C. We can obtain four linkage relationships of the repeat contigs depending on reads covering different contigs. Among them 2 and 3 reflect the correct linkage whereas 1 and 4 was incorrect. D. The relationship shown in C was displayed in ContigScape. 1S-1E represent contig1; 2S-2E represent contig2; 3S-3E represent contig3; 4S-4E represent contig4; RS-RE represent contigR; red coloring represents repeat contigs, dark blue coloring represents unique contigs. “S” represents the starting position of the contig and “E” represents the termination location of the contig. Num 1, Num 2, Num 3, Num 4 represent the number of reads connecting contig R and “1E”, “2S”, “3S”, and “4S”, respectively. Length1, length2, length3 and length4 represent the lengths of contig1, contig2, contig3 and contig4, respectively. The width of the green edge is proportional to their number.
Figure 4
Figure 4
Schematic map of scaffolding. A. Red arrows represent repeat contigs, and black and blue arrows represent unique contigs, where the orientation of the arrow represents the direction from the 5’ to 3’ end. The purple lines represent mate-pair reads within contigs whereas green and yellow lines represent mate-pair reads spanning contigs. The mate-pair reads represented by yellow lines are mapped into unique contigs and thus they can form a scaffold. Mate-pair reads represented by green lines failed to construct scaffolds because one end of these mate-pair reads is located in repeat contigs. B. Using ContigScape to describe graph A.
Figure 5
Figure 5
Contigs’ network of eleven 454Contigs.ace files.
Figure 6
Figure 6
The contig network of 40 strains using the 454Allcontigs.ace file. This figure includes 19 genus strains: 1–11 are Streptomyces, 12 is Penicillium,13 is Actinoplanes, 14 is Amycolatopsis, 15 is Bacillus, 16,17 is Brucella, 18,36,37 are Ralstonia, 19 is Burkholderia, 2022 are Escherichia, 23,24 are Ketogulonicigenium, 25 is Klebsiella, 26 is Lactobacillus, 27,28 are Leptospira, 29 is Lysinibacillus, 3034 are Mycobacterium, 35 is Mycoplasma, 38 is Rhizobiales, 39,40 are Vibrio.
Figure 7
Figure 7
The ContigScape interface and displaying connections between contigs. A. The scaffolding of the 454LargeContigs of the Streptomyces genome using mate-pair libraries is shown. B. Hiding the contigs smaller than 2 kb in length and determining the linkages between the remaining specific contigs. C. Completing all linkages by reference, PCR, and other databases, and then obtaining the linear chromosomal sequence with terminal inverted repeats formed by two repeat contigs.
Figure 8
Figure 8
Workflow of visual strategy.

References

    1. Gritsenko AA, Nijkamp JF, Reinders MJ, de Ridder D. GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics. 2012;28(11):1429–1437. doi: 10.1093/bioinformatics/bts175. - DOI - PubMed
    1. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27(4):578–579. doi: 10.1093/bioinformatics/btq683. - DOI - PubMed
    1. Gao S, Sung WK, Nagarajan N. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol. 2011;18(11):1681–1691. doi: 10.1089/cmb.2011.0170. - DOI - PMC - PubMed
    1. Salmela L, Makinen V, Valimaki N, Ylinen J, Ukkonen E. Fast scaffolding with small independent mixed integer programs. Bioinformatics. 2011;27(23):3259–3265. doi: 10.1093/bioinformatics/btr562. - DOI - PMC - PubMed
    1. Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res. 1998;8(3):195–202. - PubMed

Publication types

Associated data

LinkOut - more resources