Long reads: their purpose and place

Martin O Pollard^{1

2}, Deepti Gurdasani^{1

2}, Alexander J Mentzer^{1

3}, Tarryn Porter^{1

2}, Manjinder S Sandhu^{1

2}

Affiliations

¹ Human Genetics - Wellcome Sanger Institute, Hinxton, Cambridge, UK.
² University of Cambridge - Department of Medicine, Addenbrookes Hospital, Box 157, Hills Road, Cambridge, UK.
³ Wellcome Centre for Human Genetics, Roosevelt Drive, Oxford, UK.

PMID: 29767702
PMCID: PMC6061690
DOI: 10.1093/hmg/ddy177

Review

Long reads: their purpose and place

Martin O Pollard et al. Hum Mol Genet. 2018.

. 2018 Aug 1;27(R2):R234-R241.

doi: 10.1093/hmg/ddy177.

Authors

Martin O Pollard^{1

2}, Deepti Gurdasani^{1

2}, Alexander J Mentzer^{1

3}, Tarryn Porter^{1

2}, Manjinder S Sandhu^{1

2}

Affiliations

¹ Human Genetics - Wellcome Sanger Institute, Hinxton, Cambridge, UK.
² University of Cambridge - Department of Medicine, Addenbrookes Hospital, Box 157, Hills Road, Cambridge, UK.
³ Wellcome Centre for Human Genetics, Roosevelt Drive, Oxford, UK.

PMID: 29767702
PMCID: PMC6061690
DOI: 10.1093/hmg/ddy177

Abstract

In recent years long-read technologies have moved from being a niche and specialist field to a point of relative maturity likely to feature frequently in the genomic landscape. Analogous to next generation sequencing, the cost of sequencing using long-read technologies has materially dropped whilst the instrument throughput continues to increase. Together these changes present the prospect of sequencing large numbers of individuals with the aim of fully characterizing genomes at high resolution. In this article, we will endeavour to present an introduction to long-read technologies showing: what long reads are; how they are distinct from short reads; why long reads are useful and how they are being used. We will highlight the recent developments in this field, and the applications and potential of these technologies in medical research, and clinical diagnostics and therapeutics.

PubMed Disclaimer

Figures

**Figure 1.**
Behaviour of reads around genomic events. (A) Large insertion: short reads at the edge of the variant are be soft-clipped. Reads within the insertion will be either unmapped or mapped incorrectly. Large reads will either span the insertion or have enough context to be marked as inserted sequence. (B) Large deletion: short reads spanning the deletion may be mismapped or only have one of the reads marked as mapped because the reference measured length indicates the insert size deviates from the expected distribution. Long reads will span the gap but most will have enough context to call the deletion. (C) Copy number variation: where the read-length exceeds the length of the CNV region reads will map correctly. Shorter reads may be collapsed and show up as increased depth in a pileup or be marked as mapping poorly. (D) Inversion: reads will either be represented as a primary alignment with an inverted supplementary or manifest as soft clipping around the edge of the inversion with a reduction in depth where reads span the edge of the inversion.

**Figure 2.**
Long-read sequencing technologies. (A) PacBio SMRT sequencing. Double stranded DNA is first sheared and size selected to the desired length and then sequencing adaptors are annealed. The adaptors are bound to a sequencing primer and strand displacing polymerase which adheres to the bottom of a well containing a zero mode wave guide. Following a pre-extension period where the polymerase reaction is run in the dark, the fragment is illuminated with a laser and as each base in the sequencing solution is incorporated, the fluorophore is detected and the polymerase reaction displaces it, giving a time and intensity signal which is converted into a base call. (B) Oxford Nanopore Technology passes the DNA molecule through a nanopore attached the flow cell surface membrane. As each base of the DNA molecule passes through the pore changes to the current passing through the pore are detected and converted into a signal. The signal detected is passed to a recurrent neural network (RNN) which converts it into base calls. (C) 10X Genomics Chromium technology works by means of an emulsion droplet technology, where gel beads are mixed with high molecular weight genomic DNA and an enzyme. Within each gel bead DNA is sheared and barcoded, creating fragments which can then be sequenced with Illumina sequencing. The presence of the chromium barcode then provides a mapper or assembler with linked-reads, allowing the relative spatial position of the fragments to be estimated *Components of figure reproduced with permission from Pacific Biosciences, Oxford Nanopore Technologies and 10X Genomics.*

**Figure 3.**
Long reads span and call variations that short reads cannot. IGV (http://software.broadinstitute.org/software/igv/home) image of (top) PacBio reads from a sample sequenced as part of the GDAP project. The reads span a 6 kb heterozygous LINE-1 element deletion and show clear depth variation. Illumina (bottom) reads from the same sample unable to be clearly mapped around the deletion with reads in white indicating where reads were unable to be uniquely mapped.

See this image and copyright information in PMC

References

1. Sanger F., Coulson A. (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol., 94, 441–448. - PubMed
1. Li W., Freudenberg J. (2014) Mappability and read length. Front. Genet., 5, 381.. - PMC - PubMed
1. Howe K., Clark M.D., Torroja C.F., Torrance J., Berthelot C., Muffato M., Collins J.E., Humphray S., McLaren K., Matthews L. (2013) The zebrafish reference genome sequence and its relationship to the human genome. Nature, 496, 498–503. - PMC - PubMed
1. Hosomichi K., Jinam T.A., Mitsunaga S., Nakaoka H., Inoue I. (2013) Phase-defined complete sequencing of the HLA genes by next-generation sequencing. BMC Genomics, 14, 355–355. - PMC - PubMed
1. Wang B., Tseng E., Regulski M., Clark T.A., Hon T., Jiao Y., Lu Z., Olson A., Stein J.C., Ware D. (2016) Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat. Commun., 7, 11708. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Long reads: their purpose and place

Affiliations

Long reads: their purpose and place

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources