Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 7;9(1):541.
doi: 10.1038/s41467-018-03016-2.

High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell

Affiliations

High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell

Todd P Michael et al. Nat Commun. .

Abstract

The handheld Oxford Nanopore MinION sequencer generates ultra-long reads with minimal cost and time requirements, which makes sequencing genomes at the bench feasible. Here, we sequence the gold standard Arabidopsis thaliana genome (KBS-Mac-74 accession) on the bench with the MinION sequencer, and assemble the genome using typical consumer computing hardware (4 Cores, 16 Gb RAM) into chromosome arms (62 contigs with an N50 length of 12.3 Mb). We validate the contiguity and quality of the assembly with two independent single-molecule technologies, Bionano optical genome maps and Pacific Biosciences Sequel sequencing. The new A. thaliana KBS-Mac-74 genome enables resolution of a quantitative trait locus that had previously been recalcitrant to a Sanger-based BAC sequencing approach. In summary, we demonstrate that even when the purpose is to understand complex structural variation at a single region of the genome, complete genome assembly is becoming the simplest way to achieve this goal.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Fig. 1
Fig. 1
Arabidopsis thaliana Oxford Nanopore MinION read quality. a Read quality versus GC content. b Aligned read quality versus GC content of mapped reads. c Read length versus GC content. d Aligned read length versus GC content of mapped reads. Lower and upper hinges correspond to the 25th and the 75th percentile. Whiskers are extended 1.5 × IQR (IQR is the interquartile range between the 25th and the 75th percentile) from the smallest and highest hinge
Fig. 2
Fig. 2
Quality comparisons of single molecule assemblies and TAIR10 reference assembly. a Final assembled genome size versus assembly rounds. b Assembly quality based on short sequence artifacts versus rounds. c Assembly quality based false-insertions versus rounds. d Assembly quality based false-deletions versus rounds. Assembly types: Oxford Nanopore MinION, ONT; PacBio Sequel, PB; miniasm, min; Falcon, fal; Canu, can; reference genome, TAIR10. Polishing rounds PacioBfal: 0 = raw assembly; 1, arrow 1x; 4, pilon 1x
Fig. 3
Fig. 3
Bionano Genomics maps identify mis-assemblies and hard to assemble regions in the Oxford Nanopore MinION assembly. BNG cmap_30 (blue; marked as 30) identified a a chimeric ONTcan contig 1 (green) and b the correct assembled contig 1 in the ONTmin assembly (green). The chimeric position is indicated with a red bar. c A collapsed region in ONTmin contig 5, in which approximately 15 kb sequences are missing from one of the two potential repeat regions as identified by the GC pattern (gray bars). In contrast, d shows a falsely duplicated region of approximately 18 kb, with the duplicated repeat region highlighted (red bar, 18 kb). e ONTmin assembly resolves various telomere regions, for example after 12.332 Mb on contig 1, as outlined by a GC plot (blue line). f ONTmin also resolves short centromere arrays as shown toward the end of contig 3 (blue, GC plot)
Fig. 4
Fig. 4
Resolution of the At4g30720 duplication using the KBS-Mac-74 ONTmin genome. Col-0 only has one copy of At4g30720 on the distal end (15 Mb) of chromosome 4 (Chr4). KBS-Mac-74 (ONTmin) assembly has two copies of At4g30720; one at the beginning of contig 31 that corresponds to the SG3 QTL location, and one at the distal end of contig 31 that overlaps with the SG3 interacting (SG3i) region in the middle of chromosome 4. At the SG3i locus, the Col-0 gene At4g08995 (889 bp), which is annotated as a transposable element (RT; putative reverse transcriptase), is replaced in the KBS-Mac-74 ONTmin assembly with a 39-kb expansion that includes a duplicated copy of At4g30720. Fragments of the transposable element (different gray arrows) are scattered across the KBS-Mac-74 region consistent with several rounds of transposition resulting in this complex rearranged region

References

    1. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi: 10.1038/35048692. - DOI - PubMed
    1. Kawakatsu T, et al. Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell. 2016;166:492–505. doi: 10.1016/j.cell.2016.06.044. - DOI - PMC - PubMed
    1. Long Q, et al. Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden. Nat. Genet. 2013;45:884–890. doi: 10.1038/ng.2678. - DOI - PMC - PubMed
    1. 1001 Genomes Consortium. 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell. 2016;166:481–491. doi: 10.1016/j.cell.2016.05.063. - DOI - PMC - PubMed
    1. Clark RM, et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007;317:338–342. doi: 10.1126/science.1138632. - DOI - PubMed

Publication types

MeSH terms

Substances