Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Apr 21;106(16):6712-7.
doi: 10.1073/pnas.0901902106. Epub 2009 Apr 2.

High-throughput, high-accuracy array-based resequencing

Affiliations

High-throughput, high-accuracy array-based resequencing

Jianbiao Zheng et al. Proc Natl Acad Sci U S A. .

Abstract

Although genomewide association studies have successfully identified associations of many common single-nucleotide polymorphisms (SNPs) with common diseases, the SNPs implicated so far account for only a small proportion of the genetic variability of tested diseases. It has been suggested that common diseases may often be caused by rare alleles missed by genomewide association studies. To identify these rare alleles we need high-throughput, high-accuracy resequencing technologies. Although array-based genotyping has allowed genomewide association studies of common SNPs in tens of thousands of samples, array-based resequencing has been limited for 2 main reasons: the lack of a fully multiplexed pipeline for high-throughput sample processing, and failure to achieve sufficient performance. We have recently solved both of these problems and created a fully multiplexed high-throughput pipeline that results in high-quality data. The pipeline consists of target amplification from genomic DNA, followed by allele enrichment to generate pools of purified variant (or nonvariant) DNA and ends with interrogation of purified DNA on resequencing arrays. We have used this pipeline to resequence approximately 5 Mb of DNA (on 3 arrays) corresponding to the exons of 1,500 genes in >473 samples; in total >2,350 Mb were sequenced. In the context of this large-scale study we obtained a false positive rate of approximately 1 in 500,000 bp and a false negative rate of approximately 10%.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: This work was part of the research done at Affymetrix Laboratory, the research arm of Affymetrix. However, there are no current products or specific plans to make products of work described in this manuscript.

Figures

Fig. 1.
Fig. 1.
Schematic of dU probe manufacturing process. Individual PCRs are performed for each locus. Genomic DNA is amplified with primers that contain sequences specific to the region of interest (gray) and 13-bp sequences common to all primers (orange and red) appended the 5′ end. After amplification of target DNA, all of the PCR products are pooled. A small aliquot is then used for a secondary amplification with common primers (orange and red). The common primers are 30 nt long and are able to amplify all of the amplicons because they have 13 nt in common with all of the amplicons. In this PCR dUTP is used instead of dTTP. The results in the generation of the dU probes that are double-stranded sequences with all of the Ts replaced by Us. Each dU probe has a unique sequence flanked by 30 nt common to all of the dU probes.
Fig. 2.
Fig. 2.
TACL. Genomic DNA (black) is digested with RE and hybridized with the pool of dU probes and common primers. If the dU probe was designed with RE sites at each end (Situation 1), the digested DNA will hybridize with dU probe and common primers to create a double-stranded structure with 2 nicks. To increase genomic coverage, some dU probes were designed so that the 3′ end of digested DNA has a PM, but the 5′ does not, generating a structure with a flap up to 1,000 bp long (Situation 2). The use of 5′flap endonuclease makes the 5′ end a substrate for a thermostable ligase that is able to close the nicks. UDG and heat treatment destroy the dU probes, leaving only genomic DNA liagated to common primers that can be later amplified by using common primers.
Fig. 3.
Fig. 3.
Ratio analysis. The x axis shows the contrast between the Vs and the NVs. The contrast is computed as (Vs − NVs)/(Vs + NVs). Therefore, if the fragment is nonvariant, variant, or heterozygous, the contrast is expected to be −1, +1, or 0, respectively. The y axis is the signal sum (Vs + NVs).
Fig. 4.
Fig. 4.
Dip analysis. The x axes show position in the amplicon for a 200-bp amplicon. (A) The y axis shows the NVPs obtained at each position with data shown for 19 different samples. Even though the signal differs drastically among positions, the signal for each position among different samples is relatively tight. Hence data from some NVP samples can be used to build a model of the expected signals for each position and data from the VP can be compared to find regions of poor hybridization that may contain a variant. (B) The y axis shows the comparison of the VP pool with the model generated from the NVP samples; data from both strands are shown (solid red and blue circles). Open circles show processed data after dip fitting.
Fig. 5.
Fig. 5.
ROC analysis. (A) The ROC curve showing tradeoff between false positive and negative for the 3 different enzyme panels/arrays. The 3 panels/arrays have somewhat different performance with the best performance (blue) seen for the Dde panel that carries the MutS-overexpressing strain. The average performance among the 3 panels shows a false positive rate of 1/500,000 bp at a sensitivity of ≈90%. (B) This plot shows more specific performance data for the intermediate performance in A (Hpy). The combination score (dark blue) defines the performance of the technology (and is the same as the green line in A). Much of the power comes from the robust ratio analysis (light blue) that uses data from least 54 probes. Because they use data from fewer features, the dip (green) and the base (red) analyses have lower power. However, they add to the power of the ratio particularly at low false positive rates (comparison of light blue and dark blue lines) and localizing and identifying the specific change. The dark blue curve shows the combo calls that have a base call (i.e., it excludes the class where a combo call is present but the base could not be determined).

References

    1. Chakravarti A, Little P. Nature, nurture, and human disease. Nature. 2003;421:412–414. - PubMed
    1. Reich DE, Lander ES. On the allelic spectrum of human disease. Trends Genet. 2001;17:502–510. - PubMed
    1. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137. - PMC - PubMed
    1. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: Common disease–common variant or not? Hum Mol Genet. 2002;11:2417–2423. - PubMed
    1. Plenge RM, et al. Two independent alleles at 6q23 associated with risk of rheumatoid arthritis. Nat Genet. 2007;39:1477–1482. - PMC - PubMed

LinkOut - more resources