. 2013 Jul;24(2):73-86.

doi: 10.7171/jbt.13-2402-002.

Comparison of commercially available target enrichment methods for next-generation sequencing

K Bodi¹, A G Perera, P S Adams, D Bintzler, K Dewar, D S Grove, J Kieleczawa, R H Lyons, T A Neubert, A C Noll, S Singh, R Steen, M Zianni

Affiliations

PMID: 23814499
PMCID: PMC3605921
DOI: 10.7171/jbt.13-2402-002

Comparison of commercially available target enrichment methods for next-generation sequencing

K Bodi et al. J Biomol Tech. 2013 Jul.

. 2013 Jul;24(2):73-86.

doi: 10.7171/jbt.13-2402-002.

Authors

K Bodi¹, A G Perera, P S Adams, D Bintzler, K Dewar, D S Grove, J Kieleczawa, R H Lyons, T A Neubert, A C Noll, S Singh, R Steen, M Zianni

Affiliation

¹ Tufts University School of Medicine, Boston, Massachusetts 02452, USA. kip.bodi@tufts.edu

PMID: 23814499
PMCID: PMC3605921
DOI: 10.7171/jbt.13-2402-002

Abstract

Isolating high-priority segments of genomes greatly enhances the efficiency of next-generation sequencing (NGS) by allowing researchers to focus on their regions of interest. For the 2010-11 DNA Sequencing Research Group (DSRG) study, we compared outcomes from two leading companies, Agilent Technologies (Santa Clara, CA, USA) and Roche NimbleGen (Madison, WI, USA), which offer custom-targeted genomic enrichment methods. Both companies were provided with the same genomic sample and challenged to capture identical genomic locations for DNA NGS. The target region totaled 3.5 Mb and included 31 individual genes and a 2-Mb contiguous interval. Each company was asked to design its best assay, perform the capture in replicates, and return the captured material to the DSRG-participating laboratories. Sequencing was performed in two different laboratories on Genome Analyzer IIx systems (Illumina, San Diego, CA, USA). Sequencing data were analyzed for sensitivity, specificity, and coverage of the desired regions. The success of the enrichment was highly dependent on the design of the capture probes. Overall, coverage variability was higher for the Agilent samples. As variant discovery is the ultimate goal for a typical targeted sequencing project, we compared samples for their ability to sequence single-nucleotide polymorphisms (SNPs) as a test of the ability to capture both chromosomes from the sample. In the targeted regions, we detected 2546 SNPs with the NimbleGen samples and 2071 with Agilent's. When limited to the regions that both companies included as baits, the number of SNPs was ∼1000 for each, with Agilent and NimbleGen finding a small number of unique SNPs not found by the other.

Keywords: Agilent; Illumina; NimbleGen; targeted capture.

PubMed Disclaimer

Figures

**FIGURE 1**
Alignment results after normalization. These histograms show the percentage of reads that are ambiguous or failed to align to the genome or were PCR duplicates. Ambiguous: reads that aligned to multiple places in the genome. Failed to Align: reads that did not have a valid alignment to any location in the genome. PCR Duplicate: reads that had the same genomic 5′ and 3′ coordinates. There are four bars showing the performance of each replicate for each platform in each category. Each bar is labeled by the sequencing site (S: Stowers Institute for Medical Research; M: University of Michigan) and replicate (1 or 2).

**FIGURE 2**
Library insert size distribution for each platform. Agilent's mean insert size was ∼200 bp, whereas the mean was ∼150 bp for the NimbleGen array samples and ∼100 bp for the NimbleGen SeqCap samples. For each sample, the upper and lower portions of the boxes represent the 25th and 75th percentiles, respectively; the line in the middle of each box represents the 50th percentile. The whiskers extend to 1.5× the interquartile range. Outliers are plotted as points.

**FIGURE 3**
Specificity: read counts and percentage mapping to 3.5-Mb targeted region for each platform. This histogram shows the number of reads on target for each platform and for each replicate after the removal of ambiguously mapped reads and PCR duplicates. The proportion of reads shows the number of reads mapping compared with the total number of reads in each set. More PCR duplicate and ambiguous reads were discarded from the NimbleGen alignments, resulting in a lower number of overall reads but with a higher per-sample percentage on target. Each bar is labeled by the sequencing site.

**FIGURE 4**
Sensitivity: percent coverage of 3.5-Mb targeted region and bait region for each platform. This chart shows how well the targeted region is covered for read depths from 1× to 500×. PCR duplicates were removed for the plots on the left column and retained for the plots on the right right column. The upper row shows coverage for only the baited regions in common between the two platforms, and the lower row shows coverage for the entire 3.5-Mb targeted region.

**FIGURE 5**
Reads mapping to regions near baited areas. This chart shows the percentage of reads mapping to regions from 0 to 300 bp away from the baited areas, inclusive of the baited areas themselves. The longer DNA insert size for the Agilent samples appears to be closely linked to the improvement in performance from 0 to 150 bp.

**FIGURE 6**
Alignment of reads from each platform near targets bone morphogenetic protein 2 (BMP2), growth differentiation factor 6 (GDF6), and T-box 18 (TBX18). These alignments show GC content of the genomic region, coding sequence, targeted regions, and bait regions for each platform for a given replicate. Percent GC content for the DNA sequence is represented by an orange histogram at the top of the alignment view, ranging from 0 to 100% on the y-axis. The blue track represents the coding sequence for a gene. The gray track represents the targeted region. Red tracks show Agilent-baited regions, and green tracks show the NimbleGen-baited regions. Coverage plots are included for all three targets for each technology, and aligned reads themselves are included for BMP2 and GDF6. Coverage plots are represented by histograms showing the relative per-base coverage for that region on the y-axis. Yellow bars represent the sum of the “+” and “−“ strand coverage; green bars represent the + coverage; and blue bars represent the − strand coverage. For the BMP2 and GDF6 plots, reads aligning to the + strand are in green, and reads aligning to the − strand are in blue. A high GC region is boxed in red, and its effect on the Agilent bait design is circled in blue.

**FIGURE 7**
Alignment of reads from each platform near target HOXB11. This alignment shows the coverage levels and individual reads for all three technologies at one of the targets in the capture. Off-target reads mapping to similar regions are boxed in blue.

**FIGURE 8**
Kernel density plot of entire 3.5-Mb capture and bait regions in common. This chart shows the kernel density function for the three platforms studied. Coverage values for each position of the 3.5-Mb targeted capture were pooled, and the frequency of values at each depth was used to calculate the density function. Plots were generated for the entire 3.5-Mb capture (lower row) and only the bait regions in common (upper row). The effect of the removal of PCR duplicates is shown (compare left column with right).

**FIGURE 9**
Kernel density plots of coverage of three selected regions. This chart shows the kernel density function for the three platforms studied over three selected regions: BMP2, GDF6, and TBX18. The plots were generated from a set with PCR duplicates removed.

**FIGURE 10**
SNP counts and concordance by platform. These diagrams show the number of SNPs found for each technology, as well as those found in common in only the baited regions in common. The in-solution methods alone are also compared. SNP counts given were generated from a combined alignment of four sets from each company (two replicates at each sequencing site).

**FIGURE 11**
SNPs found for each platform. This histogram shows the number of SNPs found for each platform in the entire 3.5-Mb targeted region, separated by those on-target and off-target. Reads from replicates for each platform were pooled prior to SNP detection. The variation by replicate is shown by bars at the top of each stacked histogram. The total number of SNPs found is boxed within each histogram. SNP counts given are the average over the four sets for each company (two replicates at each sequencing site).

See this image and copyright information in PMC

References

1. Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet 2010;11:31–46 - PubMed
1. Hopp K, Heyer CM, Hommerding CJ, et al. B9D1 is revealed as a novel Meckel syndrome (MKS) gene by targeted exon enriched next-generation sequencing and deletion analysis. Hum Mol Genet 2011;20:2524–2534 - PMC - PubMed
1. Tan IB, Cutcutache I, Zang ZJ, et al. Fanconi's anemia in adulthood: chemoradiation-induced bone marrow failure and a novel FANCA mutation identified by targeted deep sequencing. J Clin Oncol 2011;29:e591–e594 - PubMed
1. Mamanova L, Coffey AJ, Scott CE, et al. Target-enrichment strategies for next-generation sequencing. Nat Methods 2010;7:111–118 - PubMed
1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009;10:R25. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of commercially available target enrichment methods for next-generation sequencing

Affiliation

Comparison of commercially available target enrichment methods for next-generation sequencing

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous