Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May;27(5):455-7.
doi: 10.1038/nbt0509-455.

How to map billions of short reads onto genomes

Affiliations

How to map billions of short reads onto genomes

Cole Trapnell et al. Nat Biotechnol. 2009 May.

Abstract

Mapping the vast quantities of short sequence fragments produced by next-generation sequencing platforms is a challenge. What programs are available and how do they work?

PubMed Disclaimer

Figures

Figure 1
Figure 1
Two recent algorithmic approaches for aligning short (20–200-bp) sequencing reads. (a) Algorithms based on spaced-seed indexing, such as Maq, index the reads as follows: each position in the reference is cut into equal-sized pieces, called ‘seeds’ and these seeds are paired and stored in a lookup table. Each read is also cut up according to this scheme, and pairs of seeds are used as keys to look up matching positions in the reference. Because seed indices can be very large, some algorithms (including Maq) index the reads in batches and treat substrings of the reference as queries. (b) Algorithms based on the Burrows-Wheeler transform, such as Bowtie, store a memory-efficient representation of the reference genome. Reads are aligned character by character from right to left against the transformed string. With each new character, the algorithm updates an interval (indicated by blue ‘beams’) in the transformed string. When all characters in the read have been processed, alignments are represented by any positions within the interval. Burrows-Wheeler–based algorithms can run substantially faster than spaced seed approaches, primarily owing to the memory efficiency of the Burrows-Wheeler search. Chr., chromosome.
Figure 2
Figure 2
RNA-Seq assays produce short reads sequenced from processed mRNAs. Aligning these reads to the genome with Bowtie or Maq will produce the alignments shown in black but will fail to align the blue reads. A spliced-read mapper such as TopHat or ERANGE will also report the (blue) alignments spanning intron boundaries.

References

    1. Nagalakshmi U, et al. Science. 2008;320:1344–1349. - PMC - PubMed
    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Nat Methods. 2008;5:621–628. - PubMed
    1. Wang ET, et al. Nature. 2008;456:470–476. - PMC - PubMed
    1. Johnson DS, Mortazavi A, Myers RM, Wold B. Science. 2007;316:1497–1502. - PubMed
    1. Ley TJ, et al. Nature. 2008;456:66–72. - PMC - PubMed