Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Oct;1(5):e53.
doi: 10.1371/journal.pcbi.0010053. Epub 2005 Oct 28.

SNPdetector: a software tool for sensitive and accurate SNP detection

Affiliations

SNPdetector: a software tool for sensitive and accurate SNP detection

Jinghui Zhang et al. PLoS Comput Biol. 2005 Oct.

Abstract

Identification of single nucleotide polymorphisms (SNPs) and mutations is important for the discovery of genetic predisposition to complex diseases. PCR resequencing is the method of choice for de novo SNP discovery. However, manual curation of putative SNPs has been a major bottleneck in the application of this method to high-throughput screening. Therefore it is critical to develop a more sensitive and accurate computational method for automated SNP detection. We developed a software tool, SNPdetector, for automated identification of SNPs and mutations in fluorescence-based resequencing reads. SNPdetector was designed to model the process of human visual inspection and has a very low false positive and false negative rate. We demonstrate the superior performance of SNPdetector in SNP and mutation analysis by comparing its results with those derived by human inspection, PolyPhred (a popular SNP detection tool), and independent genotype assays in three large-scale investigations. The first study identified and validated inter- and intra-subspecies variations in 4,650 traces of 25 inbred mouse strains that belong to either the Mus musculus species or the M. spretus species. Unexpected heterozygosity in CAST/Ei strain was observed in two out of 1,167 mouse SNPs. The second study identified 11,241 candidate SNPs in five ENCODE regions of the human genome covering 2.5 Mb of genomic sequence. Approximately 50% of the candidate SNPs were selected for experimental genotyping; the validation rate exceeded 95%. The third study detected ENU-induced mutations (at 0.04% allele frequency) in 64,896 traces of 1,236 zebra fish. Our analysis of three large and diverse test datasets demonstrated that SNPdetector is an effective tool for genome-scale research and for large-sample clinical studies. SNPdetector runs on Unix/Linux platform and is available publicly (http://lpg.nci.nih.gov).

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic Diagram of the Principal Steps in the Analysis of Sequencing Variants Found by SNPdetector
Paralellograms are analytical modules (usually C programs), and rectangles are input and output data. Programs obtained from the public domain are displayed in italics while those developed in this work are shown in bold. SNPdetector requires the following three sets of input data: (1) a template sequence file, (2) the forward and the reverse sequencing primers, and (3) the trace files. The output includes a list of high-quality SNPs and their genotype calls in each subject.
Figure 2
Figure 2. Rejected and Accepted Bases in a Sequence Trace
The Phred quality scores are indicated at the top. The quality scores for rejected bases are labeled in red. Accepted bases are marked by rectangular boxes. (A) A subregion of polyA bubble showing that low-quality bases with no secondary peaks are accepted by SNPdetector. (B) A subregion showing that a Q20 base is rejected because of its high secondary peak even though the majority of neighboring bases have high-quality scores.
Figure 3
Figure 3. A PolyA Bubble That Occurs in Multiple Samples
The bubble was recognized as a sequencing artifact by SNPdetector, and no SNP was called even though the alternative adenine residue (in the highlighted column) appeared in two samples with an average Phred quality score of 20. In addition, all three traces in this region have a polyG spill at the right, with a secondary guanine peak spanning four residues; and a polyT spill at the left, with a secondary thymine peak spanning three residues.
Figure 4
Figure 4. Sequence Traces of a SNP Cluster with Three Consecutive SNPs
The top is a homozygous sample and the bottom a heterozygous one. The Phred quality score is labeled on top of each base. In the heterozygous sample, the three HQDPs around the three heterozygotes are labeled with red lines at the bottom. The flanking bases used for calculating genotype quality class of the highlighted heterozygote in the middle are marked by rectangular boxes, which do not include any HQDPs. The flanking bases used to assess background noise in the flanking region are labeled with brackets at the bottom.

Similar articles

Cited by

  • SNP-PHAGE--High throughput SNP discovery pipeline.
    Matukumalli LK, Grefenstette JJ, Hyten DL, Choi IY, Cregan PB, Van Tassell CP. Matukumalli LK, et al. BMC Bioinformatics. 2006 Oct 23;7:468. doi: 10.1186/1471-2105-7-468. BMC Bioinformatics. 2006. PMID: 17059604 Free PMC article.
  • High-throughput genetic mapping of mutants via quantitative single nucleotide polymorphism typing.
    Liu S, Chen HD, Makarevitch I, Shirmer R, Emrich SJ, Dietrich CR, Barbazuk WB, Springer NM, Schnable PS. Liu S, et al. Genetics. 2010 Jan;184(1):19-26. doi: 10.1534/genetics.109.107557. Epub 2009 Nov 2. Genetics. 2010. PMID: 19884313 Free PMC article.
  • Genomic subtyping and therapeutic targeting of acute erythroleukemia.
    Iacobucci I, Wen J, Meggendorfer M, Choi JK, Shi L, Pounds SB, Carmichael CL, Masih KE, Morris SM, Lindsley RC, Janke LJ, Alexander TB, Song G, Qu C, Li Y, Payne-Turner D, Tomizawa D, Kiyokawa N, Valentine M, Valentine V, Basso G, Locatelli F, Enemark EJ, Kham SKY, Yeoh AEJ, Ma X, Zhou X, Sioson E, Rusch M, Ries RE, Stieglitz E, Hunger SP, Wei AH, To LB, Lewis ID, D'Andrea RJ, Kile BT, Brown AL, Scott HS, Hahn CN, Marlton P, Pei D, Cheng C, Loh ML, Ebert BL, Meshinchi S, Haferlach T, Mullighan CG. Iacobucci I, et al. Nat Genet. 2019 Apr;51(4):694-704. doi: 10.1038/s41588-019-0375-1. Epub 2019 Mar 29. Nat Genet. 2019. PMID: 30926971 Free PMC article.
  • Current Progresses of Single Cell DNA Sequencing in Breast Cancer Research.
    Liu J, Adhav R, Xu X. Liu J, et al. Int J Biol Sci. 2017 Jul 18;13(8):949-960. doi: 10.7150/ijbs.19627. eCollection 2017. Int J Biol Sci. 2017. PMID: 28924377 Free PMC article. Review.
  • The genomic landscape of hypodiploid acute lymphoblastic leukemia.
    Holmfeldt L, Wei L, Diaz-Flores E, Walsh M, Zhang J, Ding L, Payne-Turner D, Churchman M, Andersson A, Chen SC, McCastlain K, Becksfort J, Ma J, Wu G, Patel SN, Heatley SL, Phillips LA, Song G, Easton J, Parker M, Chen X, Rusch M, Boggs K, Vadodaria B, Hedlund E, Drenberg C, Baker S, Pei D, Cheng C, Huether R, Lu C, Fulton RS, Fulton LL, Tabib Y, Dooling DJ, Ochoa K, Minden M, Lewis ID, To LB, Marlton P, Roberts AW, Raca G, Stock W, Neale G, Drexler HG, Dickins RA, Ellison DW, Shurtleff SA, Pui CH, Ribeiro RC, Devidas M, Carroll AJ, Heerema NA, Wood B, Borowitz MJ, Gastier-Foster JM, Raimondi SC, Mardis ER, Wilson RK, Downing JR, Hunger SP, Loh ML, Mullighan CG. Holmfeldt L, et al. Nat Genet. 2013 Mar;45(3):242-52. doi: 10.1038/ng.2532. Epub 2013 Jan 20. Nat Genet. 2013. PMID: 23334668 Free PMC article.

References

    1. Yeung AT, Hattangadi D, Blakesley L, Nicolas E. Enzymatic mutation detection technologies. Biotechiques. 2005;38:749–758. - PubMed
    1. Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928–933. - PubMed
    1. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407:513–516. - PubMed
    1. Mullikin JC, Hunt SE, Cole CG, Mortimore BJ, Rice CM, et al. An SNP map of human chromosome 22. Nature. 2000;407:516–520. - PubMed
    1. Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, et al. A general approach to single-nucleotide polymorphism discovery. Nat Genet. 1999;23:452–456. - PubMed

Publication types