Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov-Dec;32(11-12):2092-2106.
doi: 10.1101/gr.277031.122. Epub 2022 Nov 9.

Sequencing Illumina libraries at high accuracy on the ONT MinION using R2C2

Affiliations

Sequencing Illumina libraries at high accuracy on the ONT MinION using R2C2

Alexander Zee et al. Genome Res. 2022 Nov-Dec.

Abstract

High-throughput short-read sequencing has taken on a central role in research and diagnostics. Hundreds of different assays take advantage of Illumina short-read sequencers, the predominant short-read sequencing technology available today. Although other short-read sequencing technologies exist, the ubiquity of Illumina sequencers in sequencing core facilities and the high capital costs of these technologies have limited their adoption. Among a new generation of sequencing technologies, Oxford Nanopore Technologies (ONT) holds a unique position because the ONT MinION, an error-prone long-read sequencer, is associated with little to no capital cost. Here we show that we can make short-read Illumina libraries compatible with the ONT MinION by using the rolling circle to concatemeric consensus (R2C2) method to circularize and amplify the short library molecules. This results in longer DNA molecules containing tandem repeats of the original short library molecules. This longer DNA is ideally suited for the ONT MinION, and after sequencing, the tandem repeats in the resulting raw reads can be converted into high-accuracy consensus reads with similar error rates to that of the Illumina MiSeq. We highlight this capability by producing and benchmarking RNA-seq, ChIP-seq, and regular and target-enriched Tn5 libraries. We also explore the use of this approach for rapid evaluation of sequencing library metrics by implementing a real-time analysis workflow.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Experiment overview. Illumina RNA-seq, ChIP-seq, and Tn5-based genomic libraries (regular and enriched) were generated from different samples. The Illumina libraries were then circularized and amplified using rolling circle amplification (RCA). The resulting DNA, containing tandem repeats of Illumina library molecules, was then prepared for sequencing on the ONT MinION sequencer.
Figure 2.
Figure 2.
Sequencing Illumina RNA-seq libraries on the ONT MinION after R2C2 conversion. Insert length distribution (A) and read position–dependent identity to the reference genome (B) of R2C2 and Illumina MiSeq reads generated from the same Illumina library. (C) Comparisons of R2C2 and Illumina MiSeq read-based gene expression and splice junction usage quantification by STAR and kallisto are shown as scatter plots with marginal distributions (log2 normalized) shown as histograms. (D) Genome browser-style visualization of read alignments to the Actb locus. Mismatches are marked by lines colored by the read base (A, orange; T, green; C, blue; G, purple). Insertions are shown as gaps in the alignments, and deletions are shown as black lines.
Figure 3.
Figure 3.
Sequencing ChIP-seq libraries on the ONT MinION after R2C2 conversion. (A) Insert length distribution of R2C2 and Illumina NovaSeq 6000 reads generated from the same Illumina library. (B) Percentage of reads in the R2C2, subsampled Illumina, and full Illumina data sets overlapping with H3K4me3 peaks generated from the full Illumina H3K4me3 data set using MACS2. (C) The comparison of the number of R2C2 and subsampled Illumina reads overlapping with H3K4me3 peaks is shown as scatter plots with marginal distributions shown as histograms. Pearson's r is shown at the bottom right. (D) Genome annotation, H3K4me3 peak areas, and read coverage histograms are shown for a section of the Gmax genome.
Figure 4.
Figure 4.
Comparing R2C2 and Illumina based assemblies of a small genome. Illumina 2 × 150 reads were assembled in 134 contigs using Meraculous. R2C2 reads were assembled using miniasm into 95 contigs. The alignments of the contigs of both assemblies—(A) Illumina and (B) R2C2—are shown as dot plots generated by MUMmer (Kurtz et al. 2004). Both approaches failed to assemble a section of the Wolbachia genome that contains pseudogenes and a transposable element near coordinate 500,000.
Figure 5.
Figure 5.
Evaluating target-enriched Tn5 libraries with R2C2. (A,D) Insert length of library molecules sequenced by Illumina or R2C2 approaches. (B,E) Comparison of per-base coverage in the Illumina and R2C2 data sets. Marginal distributions are log2 normalized. (C,F) Alignment-based read position–dependent accuracy shown for the indicated sequencing reads and methods. (G,H) Sequencing coverage plot of the target-enriched Tn5 libraries for R2C2 and Illumina results at Chromosome 7: 55,134,584–55,211,629, which covers a part of the EGFR gene. Top panel shows the annotation of one EGFR isoform. The x-axis of the coverage plot is the base pair position, and the y-axis is the total number of reads at each position. The dotted lines indicate zoomed-in views of exons that contain the 15-bp deletion in NCI-H1650 (left) and the C-to-T and T-to-G point mutations in NCI-H1975 (right). Both samples’ Illumina reads and the R2C2 read alignments of the selected regions are shown. The mismatches are colored based on the read base (A, orange; T, green; C, blue; G, purple).
Figure 6.
Figure 6.
Real-time characterization of Illumina sequencing libraries. (A) Diagram of PLNK functionality; FAST5 files processed in the order they are produced. PLNK controls guppy5 for base-calling, C3POa for consensus calling, and mappy for alignment, as well as calculates metrics based on those alignments. (BD) Simulation of real-time analysis for enriched Tn5 (B), ChIP-seq (C), and RNA-seq (D) libraries. For each time point, panels from top to bottom show (1) the number of FAST5 files that are produced and processed, (2) the number of demultiplexed reads produced by guppy5/C3POa/demultiplexing, (3) the percentage of reads associated with each library in the sequenced pool, (4) the percentage of reads overlapping with target regions, and (5) the median read coverage of bases in the target regions.

Similar articles

Cited by

References

    1. Adams M, McBroome J, Maurer N, Pepper-Tunick E, Saremi NF, Green RE, Vollmers C, Corbett-Detig RB. 2020. One fly–one genome: chromosome-scale genome assembly of a single outbred Drosophila melanogaster. Nucleic Acids Res 48: e75. 10.1093/nar/gkaa450 - DOI - PMC - PubMed
    1. Ali SM, Hensing T, Schrock AB, Allen J, Sanford E, Gowen K, Kulkarni A, He J, Suh JH, Lipson D, et al. 2016. Comprehensive genomic profiling identifies a subset of crizotinib-responsive ALK-rearranged non-small cell lung cancer not detected by fluorescence in situ hybridization. Oncologist 21: 762–770. 10.1634/theoncologist.2015-0497 - DOI - PMC - PubMed
    1. Al'Khafaji AM, Smith JT, Garimella KV, Babadi M, Sade-Feldman M, Gatzen M, Sarkizova S, Schwartz MA, Popic V, Blaum EM, et al. 2021. High-throughput RNA isoform sequencing using programmable cDNA concatenation. bioRxiv 10.1101/2021.10.01.462818v1 - DOI - PubMed
    1. Barski A, Cuddapah S, Cui K, Roh T-Y, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. 2007. High-resolution profiling of histone methylations in the human genome. Cell 129: 823–837. 10.1016/j.cell.2007.05.009 - DOI - PubMed
    1. Baslan T, Kovaka S, Sedlazeck FJ, Zhang Y, Wappel R, Tian S, Lowe SW, Goodwin S, Schatz MC. 2021. High resolution copy number inference in cancer using short-molecule nanopore sequencing. Nucleic Acids Res 49: e124. 10.1093/nar/gkab812 - DOI - PMC - PubMed

Publication types

MeSH terms