Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;20(1):12-16.
doi: 10.1038/s41592-022-01716-8.

Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing

Affiliations

Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing

Sam Kovaka et al. Nat Methods. 2023 Jan.

Abstract

The year 2022 will be remembered as the turning point for accurate long-read sequencing, which now establishes the gold standard for speed and accuracy at competitive costs. We discuss the key bioinformatics techniques needed to power long reads across application areas and close with our vision for long-read sequencing over the coming years.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Fig. 1 |
Fig. 1 |. Long-read sequencing methods and applications.
a, Long strands of unamplified DNA can be sequenced by PacBio and ONT sequencers. ONT can also directly sequence RNA molecules while PacBio requires synthesis of cDNA. PacBio Single Molecule, Real-Time (SMRT) sequencing observes fluorescent nucleotides within zero mode waveguides as they are incorporated into a circular molecule, generating a series of forward and reverse complement fluorescence signals. ONT sequencing generates a time series of electric current for different nucleotides. These signals are input into a technology-specific base caller using neural nets or related techniques, which output reads represented as a series of adenines (green), cytidines (blue), guanines (yellow) and thymines (red). b, Signal-level analyses include base calling and DNA or RNA modification detection (purple diamond). c, Long reads improve de novo genome assembly by spanning more repetitive DNA and haplotype-specific variants. d, Long reads can span complex variants with unique alignments, allowing more accurate structural variant detection and phasing. SNP, single-nucleotide polymorphism; SV, structural variant. e, Long-read RNA sequencing can often span full transcripts in single reads, improving annotation and transcript-level quantification.
Fig. 2 |
Fig. 2 |. Improvement of sequencing technologies.
a, Mean read length reported by a selection of published genome sequencing studies. Each dot represents a report, colored by sequencing platform. b, Longest read length of ONT sequencing studies per year. c, Average contig N50 (meaning that 50% of the genome is assembled into contigs of at least the indicated size) of genomes submitted to the US National Center for Biotechnology Information for a selection of six model species. The six species were selected as having the most assembly submissions for genomes at least 2 Gb in size. Vertical dotted lines denote the release of new sequencing technologies. PB, PacBio; ONT, Oxford Nanopore Technologies. Data used in this figure may be accessed through https://github.com/schatzlab/long-read-commentary.

References

    1. Nurk S et al. Science 376, 44–53 (2022). - PMC - PubMed
    1. Aganezov S et al. Science 376, eabl3533 (2022). - PubMed
    1. Gorzynski JE et al. N. Engl. J. Med 386, 700–702 (2022). - PubMed
    1. Hufford MB et al. Science 373, 655–662 (2021). - PMC - PubMed
    1. Glinos DA et al. Nature 608, 353–359 (2022). - PMC - PubMed

Publication types

MeSH terms