TagDust--a program to eliminate artifacts from next generation sequencing data

Timo Lassmann¹, Yoshihide Hayashizaki, Carsten O Daub

Affiliations

PMID: 19737799
PMCID: PMC2781754
DOI: 10.1093/bioinformatics/btp527

TagDust--a program to eliminate artifacts from next generation sequencing data

Timo Lassmann et al. Bioinformatics. 2009.

. 2009 Nov 1;25(21):2839-40.

doi: 10.1093/bioinformatics/btp527. Epub 2009 Sep 7.

Authors

Timo Lassmann¹, Yoshihide Hayashizaki, Carsten O Daub

Affiliation

¹ Omics Science Center, Riken Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan. timolassmann@gmail.com

PMID: 19737799
PMCID: PMC2781754
DOI: 10.1093/bioinformatics/btp527

Abstract

Motivation: Next-generation parallel sequencing technologies produce large quantities of short sequence reads. Due to experimental procedures various types of artifacts are commonly sequenced alongside the targeted RNA or DNA sequences. Identification of such artifacts is important during the development of novel sequencing assays and for the downstream analysis of the sequenced libraries.

Results: Here we present TagDust, a program identifying artifactual sequences in large sequencing runs. Given a user-defined cutoff for the false discovery rate, TagDust identifies all reads explainable by combinations and partial matches to known sequences used during library preparation. We demonstrate the quality of our method on sequencing runs performed on Illumina's Genome Analyzer platform.

Availability: Executables and documentation are available from http://genome.gsc.riken.jp/osc/english/software/.

Contact: timolassmann@gmail.com.

PubMed Disclaimer

References

1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 1995;57:289–300.
1. Lassmann T, et al. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res. 2008;37:858–865. - PMC - PubMed
1. Mardis E. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24:133–141. - PubMed
1. Muth R, Manber U. Approximate multiple string search. In: Hirschberg DS, Myers EW, editors. Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching. Berlin: Springer; 1996. pp. 75–86. Number 1075.
1. von Bubnoff A. Next-generation sequencing: the race is on. Cell. 2008;132:721–723. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TagDust--a program to eliminate artifacts from next generation sequencing data

Affiliation

TagDust--a program to eliminate artifacts from next generation sequencing data

Authors

Affiliation

Abstract

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials