Characteristics of 454 pyrosequencing data--enabling realistic simulation with flowsim
- PMID: 20823302
- PMCID: PMC2935434
- DOI: 10.1093/bioinformatics/btq365
Characteristics of 454 pyrosequencing data--enabling realistic simulation with flowsim
Erratum in
- Bioinformatics. 2011 Aug 1;27(15):2171
Abstract
Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to approximately 500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments.
Results: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields.
Availability: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/.
Figures
References
-
- Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
-
- Blattner FR, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1462. - PubMed
-
- Engle ML, Burks C. GenFrag 2.1: new features for more robust fragment assembly benchmarks. Comput. Appl. Biosci. 1994;10:567–568. - PubMed
-
- Gomez-Alvarez V, et al. Systematic artifacts in metagenomes from complex microbial communities. ISME J. 2009;3:1314–1317. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
