Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 May 28:9:250.
doi: 10.1186/1471-2105-9-250.

TileQC: a system for tile-based quality control of Solexa data

Affiliations

TileQC: a system for tile-based quality control of Solexa data

Peter C Dolan et al. BMC Bioinformatics. .

Abstract

Background: Next-generation DNA sequencing technologies such as Illumina's Solexa platform and Roche's 454 approach provide new avenues for investigating genome-scale questions. However, they also present novel analytical challenges that must be met for their effective application to biological questions.

Results: Here we report the availability of tileQC, a tile-based quality control system for Solexa data written in the R language. TileQC provides a means of recognizing bias and error in Solexa output by graphically representing data generated by flow cell tiles. The data represented in the images is then made available in the R environment for further analysis and automation of error detection.

Conclusion: TileQC offers a highly adaptable and powerful tool for the quality control of Solexa-based DNA sequence data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Aberrant Solexa tiles. This figure displays three distinct types of errors that we have seen occur on Solexa tiles. The image was generated using plotQTile with the color-by category option. The tile data was drawn from tiles 129, 85, and 6 respectively – all produced from the same lane and run. The error depicted in 1b appears to be caused by a bubble in one of the reagents. This bubble increased the error rate, converting U0 reads into U1 reads. From further investigation we know this occurred during cycle 28. The error depicted in 1c is an area in which no reads at all occurred – possibly a problem with the DNA binding to the flow cell, or a smudge on the surface of the flow cell. The error depicted in 1a remains enigmatic.
Figure 2
Figure 2
Analysis of a single tile using plotQTile and cycleplot. Here we see three distinct views of tile 47 from Figure 1b. Similar to Figure 1, Figures 2a–2c were generated using the function plotQTile. Figure 2a uses the color-by category option to display the position of the reads color-coded according to their Eland category. In Figure 2b, the gray intensity values are generated by taking the mean across the first 32 cycles and then normalizing. Figure 2c is similar to 2b, but uses the minimum value across those 32 cycles instead of the mean. Figure 2c shows that some bubbles are visible from a QAG-score perspective, but the contrast between 2b and 2c shows that one must be careful to choose the proper aggregating function. Figure 2d was generated using the cycleplot function. It displays the mean QAG-score for cycles 1–32 on tile 47, and showcases the ability to detect the source of an error by decomposing the data according to cycle. Here we see the drop in average intensity that occurred during cycle 28.
Figure 3
Figure 3
Filtering the data by cycle. This figure is a closer exploration of the bubble on tile 47 analyzed in Figure 2. The tileQC system allows the output of the plotQTile function to be filtered according to cycle. This graph was generated using plotQTile and the cycles option. Fig. 3a shows the minimum QAG-score across cycles 1–27 by using cycles = 1:27, Fig. 3b restricts to the 28th cycle by using the cycle = 28 option, and Fig. 3c restricts to cycles 28–32. From these 3 tiles it is clear that the problem behind the bubble occurred during cycle 28.
Figure 4
Figure 4
An erratic tile with believable results. This image was generated using both plotTile, and cycleplot. It shows two tiles. One of the tiles (tile 1) has the usual number of reads for a tile from that run, and a typical breakdown of those reads into the Eland categories. Note the number of reads in the U0 and U1 categories in the histogram on the top right. Nevertheless, as can be seen in Figure 4a this tile has widely varying differences amongst the mean intensities (per cycle). The other tile is the familiar tile 47. The intensity levels are much better behaved, except for the problem in cycle 28. But despite this fact, there is an elevation in the U1 levels on tile 47. This is particularly notable because the lowest intensity cycle on tile 47 is at roughly the same level as the lowest found on tile 1.
Figure 5
Figure 5
Consistent errors across multiple tiles. Here multiple tiles are shown. In each of the tiles, the upper left corner looks faded. That is due to an increase in error rate that causes reads to be categorized as NM. This is a global issue – spanning multiple tiles.
Figure 6
Figure 6
Problems at the boundaries. Here we see a typical tile (tile 44) superimposed upon a summarization plot. The tile graph was generated using plotTile, and the summarization using plotSummary with summary = 2. The overlap of the two graphs (and the arrow) were produced using R, but are not produced automatically by tileQC. The red dots on the top of the tile indicate reads for which Bustard was unable to make a base-call. The dots in the summarization graph denote the number of reads (per tile) in each of the Eland categories. Note the droop in the blue U0 dots.

References

    1. Mardis ER. Anticipating the 1,000 dollar genome. Genome Biol. 2006;7:112. doi: 10.1186/gb-2006-7-7-112. - DOI - PMC - PubMed
    1. Warren RL, Sutton GG, Jones SJ, Holt RA. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2007;23:500–501. doi: 10.1093/bioinformatics/btl629. - DOI - PMC - PubMed
    1. Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V, Mardis ER, Dangl JL, Jones CD. Extending assembly of short DNA sequences to handle error. Bioinformatics. 2007;23:2942–2944. doi: 10.1093/bioinformatics/btm451. - DOI - PubMed
    1. Bentley DR. Whole-genome re-sequencing. Curr Opin Genet Dev. 2006;16:545–552. doi: 10.1016/j.gde.2006.10.009. - DOI - PubMed
    1. SS DNA Sequencing http://www.illumina.com/downloads/SS_DNAsequencing.pdf

Publication types

LinkOut - more resources