Improving SNP discovery by base alignment quality

Heng Li¹

Affiliations

PMID: 21320865
PMCID: PMC3072548
DOI: 10.1093/bioinformatics/btr076

Improving SNP discovery by base alignment quality

Heng Li. Bioinformatics. 2011.

. 2011 Apr 15;27(8):1157-8.

doi: 10.1093/bioinformatics/btr076. Epub 2011 Feb 13.

Author

Heng Li¹

Affiliation

¹ Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA. hengli@broadinstitute.org

PMID: 21320865
PMCID: PMC3072548
DOI: 10.1093/bioinformatics/btr076

Abstract

I propose a new application of profile Hidden Markov Models in the area of SNP discovery from resequencing data, to greatly reduce false SNP calls caused by misalignments around insertions and deletions (indels). The central concept is per-Base Alignment Quality, which accurately measures the probability of a read base being wrongly aligned. The effectiveness of BAQ has been positively confirmed on large datasets by the 1000 Genomes Project analysis subgroup.

Availability: http://samtools.sourceforge.net

Contact: hengli@broadinstitute.org.

PubMed Disclaimer

Figures

**Fig. 1.**
The topology of the profile HMM for BAQ computation. It consists of five types of states: alignment matches (M), insertions to the reference (I), deletions (D), alignment start (S) and alignment end (E). The S state points to every M and I state while every M and I points to E. States S and E are plotted together to avoid excessive dotted lines in the figure.

**Fig. 2.**
Transition–transversion ratio (ts/tv) as a function of the number of SNP calls. SNPs are sorted by the posterior probability of the site being a SNP (SNP probability). Given a threshold on the SNP probability, the number of SNPs of higher probability and their ts/tv are plotted. For the solid line, filters in use are as follows: (i) total depth below 500; and (ii) root mean square mapping quality above 10; (iii) P-value of reference and non-reference bases being evenly distributed on both strands is above 10⁻⁴ (by exact test).

See this image and copyright information in PMC

References

1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
1. Durbin R., et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press; 1998.
1. Li H., et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. - PMC - PubMed
1. Li H., Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 2010;11:473–83. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving SNP discovery by base alignment quality

Affiliation

Improving SNP discovery by base alignment quality

Author

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources