Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb;28(2):266-274.
doi: 10.1101/gr.221184.117. Epub 2017 Dec 22.

MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome

Affiliations

MinION-based long-read sequencing and assembly extends the Caenorhabditis elegans reference genome

John R Tyson et al. Genome Res. 2018 Feb.

Abstract

Advances in long-read single molecule sequencing have opened new possibilities for 'benchtop' whole-genome sequencing. The Oxford Nanopore Technologies MinION is a portable device that uses nanopore technology that can directly sequence DNA molecules. MinION single molecule long sequence reads are well suited for de novo assembly of complex genomes as they facilitate the construction of highly contiguous physical genome maps obviating the need for labor-intensive physical genome mapping. Long sequence reads can also be used to delineate complex chromosomal rearrangements, such as those that occur in tumor cells, that can confound analysis using short reads. Here, we assessed MinION long-read-derived sequences for feasibility concerning: (1) the de novo assembly of a large complex genome, and (2) the elucidation of complex rearrangements. The genomes of two Caenorhabditis elegans strains, a wild-type strain and a strain containing two complex rearrangements, were sequenced with MinION. Up to 42-fold coverage was obtained from a single flow cell, and the best pooled data assembly produced a highly contiguous wild-type C. elegans genome containing 48 contigs (N50 contig length = 3.99 Mb) covering >99% of the 100,286,401-base reference genome. Further, the MinION-derived genome assembly expanded the C. elegans reference genome by >2 Mb due to a more accurate determination of repetitive sequence elements and assembled the complete genomes of two co-extracted bacteria. MinION long-read sequence data also facilitated the elucidation of complex rearrangements in a mutagenized strain. The sequence accuracy of the MinION long-read contigs (∼98%) was improved using Illumina-derived sequence data to polish the final genome assembly to 99.8% nucleotide accuracy when compared to the reference assembly.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Summary of MinION sequence reads of the C. elegans VC2010 wild-type strain genome. (A) Histograph of read lengths from flow cells 112, 114, 115. (B) Plot of % read identity aligned to the C. elegans reference genome vs. read length from flow cell 115. Mean % identities ranged from 85.90% to 86.35% for the three flow cells (Supplemental Fig. S1).
Figure 2.
Figure 2.
De novo C. elegans genome assembly from MinION-generated long reads. Comparison of Canu-generated genome assemblies for individual and combined flow cell data. (A) Distribution of contig sizes for different assemblies. Number in parentheses refers to total contigs in the assembly. (B) Plot of contig coverage of the reference genome from assemblies produced from the individual and combined flow cells. Arrows denote the number of contigs that align to the C. elegans reference genome.
Figure 3.
Figure 3.
MinION produces high contiguity C. elegans genome assembly. MUMmer alignment plot of contigs from assembly of combined data from flow cells 114 and 115 against the C. elegans reference genome. The C. elegans chromosomes are arranged by size along the x-axis and 114+115 contigs along the y-axis. Forward strand matches are in red and reverse strand matches are in blue. Note that the large bacterial contigs 14 and 20 do not align to the C. elegans reference genome.
Figure 4.
Figure 4.
MinION sequencing and assembly reveals genome expansion of C. elegans repeat regions. (A) LASTZ alignment of contigs to the C. elegans reference Chromosome I. Below is a plot of coverage of the ‘All data’ assembly contig sequences against the reference. Note the high coverage areas (indicated as 1–4) that correlate with repeat regions. (B) (Left) Dot plot of the MinION-derived contig containing repeats 1 and 2 against the reference sequence. (Right) Plot of sequence read coverage (reads > 5 kb) in the repeat regions. Selected reads spanning the expanded repeats are shown below. Note the high coverage of repeat 1, which suggests that this repeat is even larger than predicted in the contig shown. Repeat 2 read coverage, which is similar to the average read coverage across the genome, suggests that this repeat expansion is correct. (C) Dot plot of the end termini of contig006 and contig027 against the reference sequence demonstrating the expansion of repeat 3. Two of the longest reads mapping to this repeat are shown below with the repeat highlighted in gray.
Figure 5.
Figure 5.
Delineation of complex genome rearrangements. Schematic of the xpf-1(e1487) complex mutation in contig017. The left y-axis shows the wild-type xpf-1 gene structure and the right y-axis shows the mab-3 region. The mutation is a duplication and insertion of ∼20 kb of the mab-3 region in two segments into the second intron of xpf-1 (blue). The larger segment of the inserted region along with the flanking xpf-1 exon 2 has been duplicated creating an inverted repeat (green and red). Shown below is read coverage and selected MinION reads mapping to the region spanning the various breakpoints.
Figure 6.
Figure 6.
Delineation of an exogenous plasmid DNA array integration and duplication event. (A) Dot plot of the ruIs32 insertion in contig1884 against the pAZ132 plasmid and unc-119 gene (red) that were integrated by biolistic transformation. The y-axis shows the pAZ132 plasmid structure and the unc-119 gene structure. The insertion contains three copies of GFP::H2B and two partial copies of unc-119. (B) MinION read coverage for Chromosome III. Note the ∼2-Mb duplication in the region where ruIs32 has been integrated. Red line is coverage and purple shading is coverage deviation. (C) MinION-generated sequence identified the two wild-type copies of unc-119 inserted with ruIs32 and the unc-119(ed3) mutation in contig0079. Superscript numbers refer to position within the contig.

References

    1. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9: 75. - PMC - PubMed
    1. Bessereau JL. 2006. Transposons in C. elegans. In WormBook (ed. The C. elegans Research Community). doi/ 10.1895/wormbook.1.70.1, http:// www.wormbook.org. - DOI
    1. Brenner S. 1974. The genetics of Caenorhabditis elegans. Genetics 77: 71–94. - PMC - PubMed
    1. The C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282: 2012–2018. - PubMed
    1. Coulson A, Waterston R, Kiff J, Sulston J, Kohara Y. 1988. Genome linking with yeast artificial chromosomes. Nature 335: 184–186. - PubMed

Publication types

MeSH terms