Incorporating sequence quality data into alignment improves DNA read mapping
- PMID: 20110255
- PMCID: PMC2853142
- DOI: 10.1093/nar/gkq010
Incorporating sequence quality data into alignment improves DNA read mapping
Abstract
New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2%. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66%. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic.
Figures





Similar articles
-
Re-alignment of the unmapped reads with base quality score.BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-16-S5-S8. Epub 2015 Mar 18. BMC Bioinformatics. 2015. PMID: 25860434 Free PMC article.
-
RazerS--fast read mapping with sensitivity control.Genome Res. 2009 Sep;19(9):1646-54. doi: 10.1101/gr.088823.108. Epub 2009 Jul 10. Genome Res. 2009. PMID: 19592482 Free PMC article.
-
OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.Gigascience. 2016 Jan 19;5:2. doi: 10.1186/s13742-016-0110-0. eCollection 2016. Gigascience. 2016. PMID: 26793302 Free PMC article.
-
De novo sequencing of plant genomes using second-generation technologies.Brief Bioinform. 2009 Nov;10(6):609-18. doi: 10.1093/bib/bbp039. Brief Bioinform. 2009. PMID: 19933209 Review.
-
Mapping RNA-seq Reads with STAR.Curr Protoc Bioinformatics. 2015 Sep 3;51:11.14.1-11.14.19. doi: 10.1002/0471250953.bi1114s51. Curr Protoc Bioinformatics. 2015. PMID: 26334920 Free PMC article. Review.
Cited by
-
Comparative genome analysis between Aspergillus oryzae strains reveals close relationship between sites of mutation localization and regions of highly divergent genes among Aspergillus species.DNA Res. 2012 Oct;19(5):375-82. doi: 10.1093/dnares/dss019. Epub 2012 Aug 21. DNA Res. 2012. PMID: 22912434 Free PMC article.
-
Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model.BMC Bioinformatics. 2012 Nov 15;13:303. doi: 10.1186/1471-2105-13-303. BMC Bioinformatics. 2012. PMID: 23151247 Free PMC article.
-
Adaptive seeds tame genomic sequence comparison.Genome Res. 2011 Mar;21(3):487-93. doi: 10.1101/gr.113985.110. Epub 2011 Jan 5. Genome Res. 2011. PMID: 21209072 Free PMC article.
-
Rapid Short-Read Sequencing and Aneuploidy Detection Using MinION Nanopore Technology.Genetics. 2016 Jan;202(1):37-44. doi: 10.1534/genetics.115.182311. Epub 2015 Oct 23. Genetics. 2016. PMID: 26500254 Free PMC article.
-
The final piece of the Triangle of U: Evolution of the tetraploid Brassica carinata genome.Plant Cell. 2022 Oct 27;34(11):4143-4172. doi: 10.1093/plcell/koac249. Plant Cell. 2022. PMID: 35961044 Free PMC article.
References
-
- Malde K. The effect of sequence quality on sequence alignment. Bioinformatics. 2008;24:897–900. - PubMed
-
- Na JC, Roh K, Apostolico A, Park K. Alignment of biological sequences with quality scores. Int. J. Bioinformatics Res. Appl. 2009;5:97–113. - PubMed
-
- Millar CD, Huynen L, Subramanian S, Mohandesan E, Lambert DM. New developments in ancient genomics. Trends Ecol. Evol. 2008;23:386–393. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases