Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007;35(21):e148.
doi: 10.1093/nar/gkm918. Epub 2007 Nov 15.

A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform

Affiliations

A bioinformatic filter for improved base-call accuracy and polymorphism detection using the Affymetrix GeneChip whole-genome resequencing platform

Gagan A Pandya et al. Nucleic Acids Res. 2007.

Abstract

DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip(R) array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Representation of the ‘alternate homology effect’. Query location is shown in bold and mismatches are shown in red. Chip oligonucleotides and sample DNA alignment at SNP location is shown. The top pair represents a sample DNA sequence perfectly matching a reference probe. The next pair illustrates a sample DNA sequence partially matching a SNP probes and therefore capable of hybridizing with high efficiency to the SNP probe pair.
Figure 2.
Figure 2.
ROC curve showing the effect of different delta binding energy threshold values on the true positive and false positive rates. The values on the line graph are the delta energy values.
Figure 3.
Figure 3.
ROC curve illustrating the effect of different quality threshold values on the true positive and false positive rates. The GSEQ quality score threshold was set to 3.0, and our quality filter was applied using different threshold values shown on the line graph.
Figure 4.
Figure 4.
Representation of the ‘footprint effect’. Query locations are in bold and mismatches are shown in red. Chip oligonucleotides and sample DNA alignments at SNP location (central 13th position) and SNP location plus two bases are shown.
Figure 5.
Figure 5.
Schematic representation of whole genome resequencing array set design. Blue vertical lines indicate repeats in the genomes. Unique sequences for LVS and SCHU S4 are shown as red and green vertical lines, respectively. Similarly, yellow and purple vertical lines represent unique sequences from plasmids pOM1 and pFNL10, respectively.

References

    1. Cebula TA, Jackson SA, Brown EW, Goswami B, LeClerc JE. Chips and SNPs, bugs and thugs: a molecular sleuthing perspective. J. Food Prot. 2005;68:1271–1284. - PubMed
    1. Mockler TC, Chan S, Sundaresan A, Chen H, Jacobsen SE, Ecker JR. Applications of DNA tiling arrays for whole-genome analysis. Genomics. 2005;85:1–15. - PubMed
    1. Hacia JG. Resequencing and mutational analysis using oligonucleotide microarrays. Nat. Genet. 1999;21:42–47. - PubMed
    1. Cutler DJ, Zwick ME, Carrasquillo MM, Yohn CT, Tobin KP, Kashuk C, Mathews DJ, Shah NA, Eichler EE, et al. High-throughput variation detection and genotyping using microarrays. Genome Res. 2001;11:1913–1925. - PMC - PubMed
    1. Wong CW, Albert TJ, Vega VB, Norton JE, Cutler DJ, Richmond TA, Stanton LW, Liu ET, Miller LD. Tracking the evolution of the SARS coronavirus using high-throughput, high-density resequencing arrays. Genome Res. 2004;14:398–405. - PMC - PubMed

Publication types

MeSH terms