Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 28;12(8):7603-7613.
doi: 10.18632/aging.103171. Epub 2020 Apr 28.

LUCS: a high-resolution nucleic acid sequencing tool for accurate long-read analysis of individual DNA molecules

Affiliations

LUCS: a high-resolution nucleic acid sequencing tool for accurate long-read analysis of individual DNA molecules

Sofia Annis et al. Aging (Albany NY). .

Abstract

Nucleic acid sequence analyses are fundamental to all aspects of biological research, spanning aging, mitochondrial DNA (mtDNA) and cancer, as well as microbial and viral evolution. Over the past several years, significant improvements in DNA sequencing, including consensus sequence analysis, have proven invaluable for high-throughput studies. However, all current DNA sequencing platforms have limited utility for studies of complex mixtures or of individual long molecules, the latter of which is crucial to understanding evolution and consequences of single nucleotide variants and their combinations. Here we report a new technology termed LUCS (Long-molecule UMI-driven Consensus Sequencing), in which reads from third-generation sequencing are aggregated by unique molecular identifiers (UMIs) specific for each individual DNA molecule. This enables in-silico reconstruction of highly accurate consensus reads of each DNA molecule independent of other molecules in the sample. Additionally, use of two UMIs enables detection of artificial recombinants (chimeras). As proof of concept, we show that application of LUCS to assessment of mitochondrial genomes in complex mixtures from single cells was associated with an error rate of 1X10-4 errors/nucleotide. Thus, LUCS represents a major step forward in DNA sequencing that offers high-throughput capacity and high-accuracy reads in studies of long DNA templates and nucleotide variants in heterogenous samples.

Keywords: DNA; LUCS; aging; cancer; chimera; mutation; sequencing.

PubMed Disclaimer

Conflict of interest statement

CONFLICTS OF INTEREST: Z.F., R.L., Z.M.-B., M.F. and J.S. declare no conflicts interests. S.A. declares interest in intellectual property described in U.S. Patent Application 2018037154. J.L.T. declares interest in intellectual property described in U.S. Patent 7,195,775, U.S. Patent 7,850,984, U.S. Patent 7,955,846, U.S. Patent 8,642,329, U.S. Patent 8,647,869, U.S. Patent 8,652,840, European Patent Specification No. EP1765085, U.S. Patent 9,150,830, U.S. Patent 9,267,111, U.S. Patent 9,845,482, U.S. Patent 9,962,411, U.S. Patent 10,525,086, and U.S. Patent Application 2018037154. D.C.W. declares interest in intellectual property described in U.S. Patent 8,642,329, U.S. Patent 8,647,869, U.S. Patent 9,150,830, U.S. Patent 10,525,086, and U.S. Patent Application 2018037154. K.K. declares interest in intellectual property described in U.S. Patent Application 2018037154.

Figures

Figure 1
Figure 1
Overview of the LUCS technology. (A) Each individual DNA molecule in a complex mixture, bearing its own unique pattern of mutations (white), has a UMI applied to it via PCR (each UMI represented by a different end-color), which is specific for that molecule (Step 1). The pool of DNA molecules is then amplified and sequenced (Step 2), during which time artefacts (i.e., PCR errors and sequencing errors) are introduced in a random fashion across molecules (red). All reads are then clustered based on their UMI (Step 3), and a consensus read is built for each molecule (Step 4). This final step removes random errors introduced during the process (red) but retains true mutations (white) found in the original molecule and in all amplicons of that molecule. (B) Two-step PCR process for UMI application and dilution. In the first 4 cycles of PCR, the targeted DNA template is amplified by 125-bp oligonucleotide barcoding primers, each containing a random UMI sequence. The initial reaction is then diluted 25-fold within a larger PCR reaction containing only synthetic primers that amplify the UMI-containing molecules after 45 additional cycles of PCR. The resultant elimination of barcoding primer 're-priming' allows for high-resolution pair-end clustering and, in particular, the detection and removal of chimeras (artificial recombinant molecules) caused by PCR jumping.
Figure 2
Figure 2
Support fraction distributions for polymorphic and heteroplasmic variants. Average base-called support fractions for polymorphic (blue, n = 36) and heteroplasmic (orange, n = 96) variants were 90.1% ± 1.7% and 89.2% ± 0.4%, respectively (mean ± SEM). Likewise, signal support fractions were comparable across polymorphic (93.5% ± 0.7%) and heteroplasmic (92.1% ± 0.6%) variants (mean ± SEM). Distributions are Kernel Density Estimates of base-called and signal support fractions, as determined by nanopolish for all variants. Base-called support and signal support fraction distributions were not significantly different (P = 0.39 and P = 0.17, respectively).
Figure 3
Figure 3
Comparison of synonymity distributions between the LUCS and Sanger sequencing datasets. Support fractions above 80% for LUCS and Sanger sequencing methods were comparatively analyzed, and display similar proportional synonymity in coding regions, indicative of low error rate.
Figure 4
Figure 4
Proportional mutational spectra for the LUCS and Sanger sequencing datasets. (A, B) The mutation spectrum was determined for each reference nucleotide for the LUCS (A) and Sanger sequencing (B) datasets. Each bar represents the proportion of a variant for a given reference base. For example, the A>G bar is the number of A>G mutations divided by the number of mutated positions that are adenines in the reference sequence. For cytosine, guanine and thymine positions, both LUCS and Sanger mutations exhibited a strong bias towards transitions. Adenine positions were more likely to mutate as a thymine transversion than as a transition in the Sanger dataset, which was reflected to a slightly lesser degree in the LUCS dataset.
Figure 5
Figure 5
Mutation rates estimated for single molecules from Sanger sequencing and LUCS datasets. Violin plots showing that mutation rates per molecule sequenced, determined by dividing the number of mutations by the coverage of a given molecule, were similar between the two technologies (P = 0.12; see text for details).

References

    1. Holley RW, Apgar J, Everett GA, Madison JT, Marquisee M, Merrill SH, Penswick JR, Zamir A. Structure of a ribonucleic acid. Science. 1965; 147:1462–65. 10.1126/science.147.3664.1462 - DOI - PubMed
    1. Sanger F, Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol. 1975; 94:441–48. 10.1016/0022-2836(75)90213-2 - DOI - PubMed
    1. Maxam AM, Gilbert W. A new method for sequencing DNA. Proc Natl Acad Sci USA. 1977; 74:560–64. 10.1073/pnas.74.2.560 - DOI - PMC - PubMed
    1. Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA 3rd, Slocombe PM, Smith M. Nucleotide sequence of bacteriophage phi X174 DNA. Nature. 1977; 265:687–95. 10.1038/265687a0 - DOI - PubMed
    1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA. 1977; 74:5463–67. 10.1073/pnas.74.12.5463 - DOI - PMC - PubMed

Publication types