Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 22:12:629.
doi: 10.1186/1471-2164-12-629.

A pilot study for channel catfish whole genome sequencing and de novo assembly

Affiliations

A pilot study for channel catfish whole genome sequencing and de novo assembly

Yanliang Jiang et al. BMC Genomics. .

Abstract

Background: Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. However, the average read lengths in next-generation sequencing technologies are short as compared with that of traditional Sanger sequencing. The short sequence reads pose great challenges for de novo sequence assembly. As a pilot project for whole genome sequencing of the catfish genome, here we attempt to determine the proper sequence coverage, the proper software for assembly, and various parameters used for the assembly of a BAC physical map contig spanning approximately a million of base pairs.

Results: A combination of low sequence coverage of 454 and Illumina sequencing appeared to provide effective assembly as reflected by a high N50 value. Using 454 sequencing alone, a sequencing depth of 18 X was sufficient to obtain the good quality assembly, whereas a 70 X Illumina appeared to be sufficient for a good quality assembly. Additional sequencing coverage after 18 X of 454 or after 70 X of Illumina sequencing does not provide significant improvement of the assembly. Considering the cost of sequencing, a 2 X 454 sequencing, when coupled to 70 X Illumina sequencing, provided an assembly of reasonably good quality. With several software tested, Newbler with a seed length of 16 and ABySS with a K-value of 60 appear to be appropriate for the assembly of 454 reads alone and Illumina paired-end reads alone, respectively. Using both 454 and Illumina paired-end reads, a hybrid assembly strategy using Newbler for initial 454 sequence assembly, Velvet for initial Illumina sequence assembly, followed by a second step assembly using MIRA provided the best assembly of the physical map contig, resulting in 193 contigs with a N50 value of 13,123 bp.

Conclusions: A hybrid sequencing strategy using low sequencing depth of 454 and high sequencing depth of Illumina provided the good quality assembly with high N50 value and relatively low cost. A combination of Newbler, Velvet, and MIRA can be used to assemble the 454 sequence reads and the Illumina reads effectively. The assembled sequence can serve as a resource for comparative genome analysis. Additional long reads using the third generation sequencing platforms are needed to sequence through repetitive genome regions that should further enhance the sequence assembly.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Hybrid assembly strategy pipeline. All 454 reads were assembled by using Newbler assembler with seed length 16, minimum overlap length 40 and minimum overlap identity 95%. All Illumina reads were assembled by using Velvet assembler with K-value 29. MIRA assembler was then used to generate the hybrid contigs with minimum overlap 40.
Figure 2
Figure 2
Comparison of N50 values assembled using Newbler with various sequencing depth of 454 sequence reads. Red circle indicates the sequence depth at which additional sequencing started to loss power for effective sequence assembly.
Figure 3
Figure 3
Comparison of N50 values assembled using ABySS with various sequencing depth of Illumina sequence reads. Red circle indicates the sequence depth at which additional sequencing started to loss power for effective sequence assembly.
Figure 4
Figure 4
Comparison of N50 values assembled with various sequencing depth combination of 454 and Illumina reads. The assemblies were achieved by using the two step approach (Newbler for 454 data, Velvet for Illumina data for initial assembly followed by assembly with MIRA, for details, see materials and methods), with various sequencing depth combination of 454 sequence reads and Illumina sequence reads. Red circle indicates the sequence depths at which additional sequencing started to loss power for effective sequence assembly.
Figure 5
Figure 5
Schematic presentation of conserved syntenic regions among catfish, zebrafish, medaka, stickleback, and Tetraodon. A list of the 18 identified genes in the sequenced region is shown on the upper panel of the figure, with each gene abbreviated using a letter to make it easier to present the conserved syntenies at the lower panel of the figure. For instance, the letter A represents creatine kinase (brain type A). Figure was drawn not proportionally to the scale as indicated by the double slashes. Same color indicates conserved syntenic blocks.

References

    1. Bengten E, Clem LW, Miller NW, Warr GW, Wilson M. Channel catfish immunoglobulins: repertoire and expression. Dev Comp Immunol. 2006;30(1-2):77–92. doi: 10.1016/j.dci.2005.06.016. - DOI - PubMed
    1. Wang S, Peatman E, Abernathy J, Waldbieser G, Lindquist E, Richardson P, Lucas S, Wang M, Li P, Thimmapuram J, Liu L, Vullaganti D, Kucuktas H, Murdock C, Small BC, Wilson M, Liu H, Jiang Y, Lee Y, Chen F, Lu J, Wang W, Xu P, Somridhivej B, Baoprasertkul P, Quilang J, Sha Z, Bao B, Wang Y, Wang Q. et al.Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies. Genome Biol. 2010;11(1):R8. doi: 10.1186/gb-2010-11-1-r8. - DOI - PMC - PubMed
    1. Xu P, Wang S, Liu L, Peatman E, Somridhivej B, Thimmapuram J, Gong G, Liu Z. Channel catfish BAC-end sequences for marker development and assessment of syntenic conservation with other fish species. Anim Genet. 2006;37(4):321–326. doi: 10.1111/j.1365-2052.2006.01453.x. - DOI - PubMed
    1. Liu Z, Li P, Dunham R. Characterization of an A/T-rich family of sequences from the channel catfish (Ictalurus punctatus) Mol Mar Biol Biotechnol. 1998;7:232–9. - PubMed
    1. Kim S, Karsi A, Dunham R, Liu Z. The skeletal muscle alpha-actin gene of channel catfish (Ictalurus punctatus) and its association with piscine specific SINE elements. Gene. 2000;252:173–181. doi: 10.1016/S0378-1119(00)00198-0. - DOI - PubMed

Publication types