Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 19;108(16):6549-54.
doi: 10.1073/pnas.1018981108. Epub 2011 Apr 5.

High-quality DNA sequence capture of 524 disease candidate genes

Affiliations

High-quality DNA sequence capture of 524 disease candidate genes

Peidong Shen et al. Proc Natl Acad Sci U S A. .

Abstract

The accurate and complete selection of candidate genomic regions from a DNA sample before sequencing is critical in molecular diagnostics. Several recently developed technologies await substantial improvements in performance, cost, and multiplex sample processing. Here we present the utility of long padlock probes (LPPs) for targeted exon capture followed by array-based sequencing. We found that on average 92% of 5,471 exons from 524 nuclear-encoded mitochondrial genes were successfully amplified from genomic DNA from 63 individuals. Only 144 exons did not amplify in any sample due to high GC content. One LPP was sufficient to capture sequences from <100-500 bp in length and only a single-tube capture reaction and one microarray was required per sample. Our approach was highly reproducible and quick (<8 h) and detected DNA variants at high accuracy (false discovery rate 1%, false negative rate 3%) on the basis of known sample SNPs and Sanger sequence verification. In a patient with clinical and biochemical presentation of ornithine transcarbamylase (OTC) deficiency, we identified copy-number differences in the OTC gene at exon-level resolution. This shows the ability of LPPs to accurately preserve a sample's genome information and provides a cost-effective strategy to identify both single nucleotide changes and structural variants in targeted resequencing.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: S.K., M.N.M., and R.W.D. are named on a patent application for work described in this paper.

Figures

Fig. 1.
Fig. 1.
DNA sequence capture and sample quality assessment. (A) A genomic DNA sample is incubated with thousands of single-stranded long padlock probes (LPPs) each of which target a specific genomic region (e.g., intronic sequence flanking an exon). Following annealing, gap filling by a DNA polymerase and probe circularization by ligation, the captured targets are amplified in multiplex using a primer pair common to all probes. The entire capture pool of one sample is hybridized to a resequencing microarray containing the complementary sequences. (B) A statistical analysis of array quality measures (R, D, and T) is used in combination to monitor the capturing yield and identify failed targets (red dots) in each sample preparation.
Fig. 2.
Fig. 2.
LPPs capture performance related to GC content and length. (A) The box plots show the distribution in GC content (y axis) of five groups of exons, defined on the basis of amplification success (R ≥ 0.9) in 63 DNA samples. These groups contain exons that amplified in all samples (group 1: 4,427 exons), exons that failed in only a subset of samples including in 1–10 samples (group 2: 522 exons), 11–40 samples (group 3: 284 exons), 41–62 samples (group 4: 242 exons), and exons that failed in all samples (group 5: 144 exons). Group 1 with the highest amplification success had the lowest mean GC content compared with groups 2–5 (P < 0.0001). (B) The box plots show the fraction of failed samples in four amplicon groups that we defined on the basis of amplicon length and GC content. Amplicons with lower GC content (<37%; <20th percentile) amplified successfully irrespective of their length, whereas amplicons with higher GC content (>61%; >80th percentile) had a tendency for amplification failures and in particular in longer amplicons (>274 bp; >80th percentile).
Fig. 3.
Fig. 3.
Detecting copy number differences with resequencing arrays. A male child with OTC deficiency (A) and his healthy mother (B) had a single-copy deletion of 9 of 10 OTC exons (Xp21.1). Each circle represents the normalized array intensity value for 1 base of that sample. The 6,000 sequenced bases are concatenated from a 12.9 Mb genomic interval and represent exons of OTC and flanking candidate genes. The vertical lines indicate exon and gene boundaries. CYBB exon 6 failed in all samples. The dashed horizontal lines represent the normalized intensity values for two, one, and zero copies. The solid horizontal line shows a smoothed estimate of the normalized base intensities (100-bp window size) that we calculated as a ratio of background adjusted sample intensities to pooled background adjusted female reference intensities. The normalized intensities are scaled so that within regions of 1 copy number in the child (A) and 2 copies in the mother (B), the corresponding median value across positions of the normalized intensities equals 1 and 2, respectively.

Similar articles

Cited by

References

    1. Albert TJ, et al. Direct selection of human genomic loci by microarray hybridization. Nat Methods. 2007;4:903–905. - PubMed
    1. Dahl F, et al. Multigene amplification and massively parallel sequencing for cancer mutation discovery. Proc Natl Acad Sci USA. 2007;104:9387–9392. - PMC - PubMed
    1. Fredriksson S, et al. Multiplex amplification of all coding sequences within 10 cancer genes by Gene-Collector. Nucleic Acids Res. 2007;35:e47. - PMC - PubMed
    1. Gnirke A, et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009;27:182–189. - PMC - PubMed
    1. Ng SB, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. - PMC - PubMed

Publication types

Substances