Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 12;9 Suppl 12(Suppl 12):S9.
doi: 10.1186/1471-2105-9-S12-S9.

VarDetect: a nucleotide sequence variation exploratory tool

Affiliations

VarDetect: a nucleotide sequence variation exploratory tool

Chumpol Ngamphiw et al. BMC Bioinformatics. .

Abstract

Background: Single nucleotide polymorphisms (SNPs) are the most commonly studied units of genetic variation. The discovery of such variation may help to identify causative gene mutations in monogenic diseases and SNPs associated with predisposing genes in complex diseases. Accurate detection of SNPs requires software that can correctly interpret chromatogram signals to nucleotides.

Results: We present VarDetect, a stand-alone nucleotide variation exploratory tool that automatically detects nucleotide variation from fluorescence based chromatogram traces. Accurate SNP base-calling is achieved using pre-calculated peak content ratios, and is enhanced by rules which account for common sequence reading artifacts. The proposed software tool is benchmarked against four other well-known SNP discovery software tools (PolyPhred, novoSNP, Genalys and Mutation Surveyor) using fluorescence based chromatograms from 15 human genes. These chromatograms were obtained from sequencing 16 two-pooled DNA samples; a total of 32 individual DNA samples. In this comparison of automatic SNP detection tools, VarDetect achieved the highest detection efficiency.

Availability: VarDetect is compatible with most major operating systems such as Microsoft Windows, Linux, and Mac OSX. The current version of VarDetect is freely available at http://www.biotec.or.th/GI/tools/vardetect.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of VarDetect's graphical user interface. The graphical user interface comprises four panes and a quick access toolbar: a) toolbar with a wizard button located on the left-most b) graphical view of input chromatogram traces c) list of predicted SNPs d) SNP information window and e) whole-map view.
Figure 2
Figure 2
Chromatogram trace showing peak intensities where dashed line is the base-call position. Three peaks are detected at this position. The intensities of green, red, and blue peaks are 500, 400, and 70 units, respectively and are used in peak intensity ratio calculation.
Figure 3
Figure 3
Calculation of vicinity peak intensity ratio of the base-call position (arrowed). [2]-vicinity ratio (k = 2) is calculated by normalizing the surrounding signal intensities of two bases left and right of the observed position as described in Equation 2 as follows: Qv3 = 1/4 × (1 + 0.94 + 1 + 0.97) = 0.977
Figure 4
Figure 4
The peak intensity ratio approach may not correctly base-call different peak patterns. The Qo value from both boxes 1 and 2 are identical (0.551); however the black peak in box 2 is misinterpreted as a primary peak, since it clearly over-shoots from the adjacent base position.
Figure 5
Figure 5
Computer representation (array) of chromatogram traces.
Figure 6
Figure 6
Base-call parameter setting in VarDetect. The highest signal is determined as its primary peak, the lower signal is determined as the secondary peak. The signal contents below the noise level are ignored. The heterozygosity level in this setting roughly estimates nucleotide mixture ratio when dealing with pooled DNA.
Figure 7
Figure 7
Effect of signal intensity decay on base-calling. Correct (a) and Incorrect (b) base-calling interpretation due to signal intensity decay.
Figure 8
Figure 8
Improvement of base-calling by using Partitioning and Re-sampling (PnR) technique. For an observed base (shaded boxes), VarDetect divides a chromatogram peak into four equal parts (partitions) and focuses at the two middle parts (a). The two vectors u and v are created by connecting the points that the curve segment of the secondary peak crosses over the two partitions (b). Let ube a perpendicular vector of u by rotating it 90 counter-clockwise (c). Then the secondary peak curve has a turning point if the dot product of u·v produces a negative value. In other words, if the angle θ between v and uis obtuse, this secondary peak could be interpreted as being heterozygous peak (d).
Figure 9
Figure 9
Illustration of PnR analysis. Partitioning (a) and Re-sampling (b) of chromatogram with rising (red) peak. uis a perpendicular vector of u by rotating it 90° counter-clockwise (c). The secondary peak curve has no turning point since the dot product of u·v. produces a positive value (d). Therefore, this peak is interpreted as a homozygous peak.
Figure 10
Figure 10
Quick alignment using sliding window algorithm.
Figure 11
Figure 11
Illustration of traces with indels and their CodeMap analysis. Noise-eliminated homozygous (a), homozygous with a C/T SNP at the 5th position (b), and T insertion at the first position (c) chromatogram traces.
Figure 12
Figure 12
Illustration of VNTR with ATG deletion and its CodeMap analysis. CodeMap converts chromatogram of trinucleotide repeats (r1, r2, r3) to the corresponding numeric arrays (2 0 0 2 0 0) on the right (a). When a set of trinucleotide repeats is deleted, CodeMap reveals specific numeric patterns (underlined) on the right (b), which match with (1/2)?[2]2 pattern shown in Table 4.

References

    1. Uda M, Galanello R, Sanna S, Lettre G, Sankaran V, Chen W, Usala G, Busonero F, Maschio A, Albai G, et al. Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of beta-thalassemia. Proc Natl Acad Sci USA. 2008;105:1620–1625. doi: 10.1073/pnas.0711566105. - DOI - PMC - PubMed
    1. Kozyrev S, Abelson A, Wojcik J, Zaghlool A, Linga Reddy M, Sanchez E, Gunnarsson I, Svenungsson E, Sturfelt G, Jonsen A, et al. Functional variants in the B-cell gene BANK1 are associated with systemic lupus erythematosus. Nat Genet. 2008;40:211–216. doi: 10.1038/ng.79. - DOI - PubMed
    1. Pandya G, Holmes M, Sunkara S, Sparks A, Bai Y, Verratti K, Saeed K, Venepally P, Jarrahi B, Fleischmann R, et al. A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform. Nucleic Acids Res. 2007;35:e148. doi: 10.1093/nar/gkm918. - DOI - PMC - PubMed
    1. Adzhubei A, Laerdahl J, Vlasova A. preAssemble: a tool for automatic sequencer trace data processing. BMC Bioinformatics. 2006;7:22. doi: 10.1186/1471-2105-7-22. - DOI - PMC - PubMed
    1. Prosdocimi F, Lopes D, Peixoto F, Mourao M, Pacifico L, Ribeiro R, Ortega J. Effects of sample re-sequencing and trimming on the quality and size of assembled consensus sequences. Genet Mol Res. 2007;6:756–765. - PubMed

Publication types

LinkOut - more resources