. 2008 May 28:9:250.

doi: 10.1186/1471-2105-9-250.

TileQC: a system for tile-based quality control of Solexa data

Peter C Dolan¹, Dee R Denver

Affiliations

PMID: 18507856
PMCID: PMC2443380
DOI: 10.1186/1471-2105-9-250

TileQC: a system for tile-based quality control of Solexa data

Peter C Dolan et al. BMC Bioinformatics. 2008.

. 2008 May 28:9:250.

doi: 10.1186/1471-2105-9-250.

Authors

Peter C Dolan¹, Dee R Denver

Affiliation

¹ Department of Zoology and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, Oregon 97331, USA. dolanp@science.oregonstate.edu

PMID: 18507856
PMCID: PMC2443380
DOI: 10.1186/1471-2105-9-250

Abstract

Background: Next-generation DNA sequencing technologies such as Illumina's Solexa platform and Roche's 454 approach provide new avenues for investigating genome-scale questions. However, they also present novel analytical challenges that must be met for their effective application to biological questions.

Results: Here we report the availability of tileQC, a tile-based quality control system for Solexa data written in the R language. TileQC provides a means of recognizing bias and error in Solexa output by graphically representing data generated by flow cell tiles. The data represented in the images is then made available in the R environment for further analysis and automation of error detection.

Conclusion: TileQC offers a highly adaptable and powerful tool for the quality control of Solexa-based DNA sequence data.

PubMed Disclaimer

Figures

**Figure 1**
**Aberrant Solexa tiles**. This figure displays three distinct types of errors that we have seen occur on Solexa tiles. The image was generated using *plotQTile* with the color-by category option. The tile data was drawn from tiles 129, 85, and 6 respectively – all produced from the same lane and run. The error depicted in 1b appears to be caused by a bubble in one of the reagents. This bubble increased the error rate, converting U0 reads into U1 reads. From further investigation we know this occurred during cycle 28. The error depicted in 1c is an area in which no reads at all occurred – possibly a problem with the DNA binding to the flow cell, or a smudge on the surface of the flow cell. The error depicted in 1a remains enigmatic.

**Figure 2**
Analysis of a single tile using *plotQTile* and *cycleplot*. Here we see three distinct views of tile 47 from Figure 1b. Similar to Figure 1, Figures 2a–2c were generated using the function *plotQTile*. Figure 2a uses the color-by category option to display the position of the reads color-coded according to their Eland category. In Figure 2b, the gray intensity values are generated by taking the mean across the first 32 cycles and then normalizing. Figure 2c is similar to 2b, but uses the minimum value across those 32 cycles instead of the mean. Figure 2c shows that some bubbles are visible from a Q_AG-score perspective, but the contrast between 2b and 2c shows that one must be careful to choose the proper aggregating function. Figure 2d was generated using the *cycleplot* function. It displays the mean Q_AG-score for cycles 1–32 on tile 47, and showcases the ability to detect the source of an error by decomposing the data according to cycle. Here we see the drop in average intensity that occurred during cycle 28.

**Figure 3**
**Filtering the data by cycle**. This figure is a closer exploration of the bubble on tile 47 analyzed in Figure 2. The *tileQC* system allows the output of the *plotQTile* function to be filtered according to cycle. This graph was generated using *plotQTile* and the *cycles* option. Fig. 3a shows the minimum Q_AG-score across cycles 1–27 by using *cycles = 1:27*, Fig. 3b restricts to the 28^thcycle by using the *cycle = 28* option, and Fig. 3c restricts to cycles 28–32. From these 3 tiles it is clear that the problem behind the bubble occurred during cycle 28.

**Figure 4**
**An erratic tile with believable results**. This image was generated using both *plotTile*, and *cycleplot*. It shows two tiles. One of the tiles (tile 1) has the usual number of reads for a tile from that run, and a typical breakdown of those reads into the Eland categories. Note the number of reads in the U0 and U1 categories in the histogram on the top right. Nevertheless, as can be seen in Figure 4a this tile has widely varying differences amongst the mean intensities (per cycle). The other tile is the familiar tile 47. The intensity levels are much better behaved, except for the problem in cycle 28. But despite this fact, there is an elevation in the U1 levels on tile 47. This is particularly notable because the lowest intensity cycle on tile 47 is at roughly the same level as the lowest found on tile 1.

**Figure 5**
**Consistent errors across multiple tiles**. Here multiple tiles are shown. In each of the tiles, the upper left corner looks faded. That is due to an increase in error rate that causes reads to be categorized as NM. This is a global issue – spanning multiple tiles.

**Figure 6**
**Problems at the boundaries**. Here we see a typical tile (tile 44) superimposed upon a summarization plot. The tile graph was generated using *plotTile*, and the summarization using *plotSummary* with *summary = 2*. The overlap of the two graphs (and the arrow) were produced using R, but are not produced automatically by *tileQC*. The red dots on the top of the tile indicate reads for which Bustard was unable to make a base-call. The dots in the summarization graph denote the number of reads (per tile) in each of the Eland categories. Note the droop in the blue U0 dots.

See this image and copyright information in PMC

References

1. Mardis ER. Anticipating the 1,000 dollar genome. Genome Biol. 2006;7:112. doi: 10.1186/gb-2006-7-7-112. - DOI - PMC - PubMed
1. Warren RL, Sutton GG, Jones SJ, Holt RA. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2007;23:500–501. doi: 10.1093/bioinformatics/btl629. - DOI - PMC - PubMed
1. Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V, Mardis ER, Dangl JL, Jones CD. Extending assembly of short DNA sequences to handle error. Bioinformatics. 2007;23:2942–2944. doi: 10.1093/bioinformatics/btm451. - DOI - PubMed
1. Bentley DR. Whole-genome re-sequencing. Curr Opin Genet Dev. 2006;16:545–552. doi: 10.1016/j.gde.2006.10.009. - DOI - PubMed
1. SS DNA Sequencing http://www.illumina.com/downloads/SS_DNAsequencing.pdf

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TileQC: a system for tile-based quality control of Solexa data

Affiliation

TileQC: a system for tile-based quality control of Solexa data

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous