Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Feb 7;21(1):30.
doi: 10.1186/s13059-020-1935-5.

Opportunities and challenges in long-read sequencing data analysis

Affiliations
Review

Opportunities and challenges in long-read sequencing data analysis

Shanika L Amarasinghe et al. Genome Biol. .

Abstract

Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.

Keywords: Data analysis; Long-read sequencing; Oxford Nanopore; PacBio.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of long-read analysis tools and pipelines. a Release of tools identified from various sources and milestones of long-read sequencing. b Functional categories. c Typical long-read analysis pipelines for SMRT and nanopore data. Six main stages are identified through the presented workflow (i.e. basecalling, quality control, read error correction, assembly/alignment, assembly refinement, and downstream analyses). The green-coloured boxes represent processes common to both short-read and long-read analyses. The orange-coloured boxes represent the processes unique to long-read analyses. Unfilled boxes represent optional steps. Commonly used tools for each step in long-read analysis are within brackets. Italics signify tools developed by either PacBio or ONT companies, and non-italics signify tools developed by external parties. Arrows represent the direction of the workflow
Fig. 2
Fig. 2
Paradigms of error correction (a) and polishing (b). Errors in long reads and assembly are denoted by red crosses. Non-hybrid methods only require long reads, while hybrid methods additionally require accurate short reads (purple)
Fig. 3
Fig. 3
Methods to detect base modifications in long-read sequencing. Base modifications can be inferred from their effect on the current intensity (nanopore) and inter-pulse duration (IPD, SMRT). Strategies to call base modifications in nanopore sequencing and the corresponding tools are further depicted
Fig. 4
Fig. 4
Types of transcriptomic analyses and their steps. The choice of sequencing protocol amongst the six available workflows affects the type, characteristics, and quantity of data generated. Only direct RNA sequencing allows epitranscriptomic studies, but SMRT direct RNA sequencing is a custom technique that is not fully supported. The remaining non-exclusive applications are isoform detection, quantification, and differential analysis. The dashed lines in arrows represent upstream processes to transcriptomics

References

    1. Pollard MO, Gurdasani D, Mentzer AJ, Porter T, Sandhu MS. Long reads: their purpose and place. Hum Mol Genet. 2018;27(R2):234–41. doi: 10.1093/hmg/ddy177. - DOI - PMC - PubMed
    1. Burgess DJ. Genomics: next regeneration sequencing for reference genomes. Nat Rev Genet. 2018;19(3):125. doi: 10.1038/nrg.2018.5. - DOI - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456(7218):53–9. doi: 10.1038/nature07517. - DOI - PMC - PubMed
    1. Bentley DR. Whole-genome re-sequencing. Curr Opin Genet Dev. 2006;16(6):545–52. doi: 10.1016/j.gde.2006.10.009. - DOI - PubMed
    1. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17(6):333–51. doi: 10.1038/nrg.2016.49. - DOI - PMC - PubMed

LinkOut - more resources