Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 25;12(7):R68.
doi: 10.1186/gb-2011-12-7-r68.

Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities

Affiliations

Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities

Matthew N Bainbridge et al. Genome Biol. .

Abstract

Background: Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood.

Results: We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS.

Conclusions: We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Content and overlap of the VCR-set and REC-set designs. A hypothetical gene shows regions that would be targeted by the VCR-set (orange), REC-set (green), both designs (orange/green) and CCDS (blue). TFBS, transcription factor binding site.
Figure 2
Figure 2
GC content distributions for capture subregions. (a) REC-set capture and (b) VCR-set capture.
Figure 3
Figure 3
Coverage distributions for capture subregions. (a) Average coverage for subregions of REC-set design are shown as a proportion of the average coverage of the CCDS subregion. (b) Average coverage for subregions of VCR-set design are shown as a proportion of the average coverage of the CCDS subregion. 'R/V specific' refers to RefSeq/Vega exons not contained in the CCDS.
Figure 4
Figure 4
Normalized coverage distributions. (a) Coverage of genomic subregions, relative to the CCDS, after whole genome SOLiD sequencing. Green, regions specific to REC-set; orange, regions specific to VCR-set; blue, shared regions. 'R/V specific' refers to RefSeq/Vega exons not contained in the CCDS. (b) Proportional difference in relative coverage between capture-sequencing and WGS shows both enrichment (values > 1) and depletion (values < 1) of certain genomic subregions after capture. Green, regions specific to REC-set; orange, regions specific to VCR-set; blue shared regions.
Figure 5
Figure 5
Single nucleotide variant densities. (a) Number of SNV substitutions per base pair, of REC-set subregions, as a proportion of the SNV rate of the CCDS subregion. The absolute average value from SOLiD and Illumina sequencing is given above the data point. (b) Number of SNV substitutions per base pair, of VCR-set subregions, as a proportion of the SNV rate of the CCDS subregion. The absolute average value from SOLiD and Illumina sequencing is given above the data point. 'R/V specific' refers to RefSeq/Vega exons not contained in the CCDS.
Figure 6
Figure 6
Distribution of phyloP scores across the CCDS (blue), intronic (red) and predicted exons (green).
Figure 7
Figure 7
Minor allele frequency distributions for variants in HuRef subregions: predicted exons (green), CCDS exons (blue) and introns (red). 'Private' indicates the variant was not found in the Thousand Genomes Project.

Similar articles

Cited by

References

    1. Albert TJ, Molla MN, Muzny DM, Nazareth L, Wheeler D, Song X, Richmond TA, Middle CM, Rodesch MJ, Packard CJ, Weinstock GM, Gibbs RA. Direct selection of human genomic loci by microarray hybridization. Nat Methods. 2007;4:903–905. doi: 10.1038/nmeth1111. - DOI - PubMed
    1. Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, Middle CM, Rodesch MJ, Albert TJ, Hannon GJ, McCombie WR. Genome-wide in situ exon capture for selective resequencing. Nat Genet. 2007;39:1522–1527. doi: 10.1038/ng.2007.42. - DOI - PubMed
    1. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. doi: 10.1038/nature08250. - DOI - PMC - PubMed
    1. Okou DT, Steinberg KM, Middle C, Cutler DJ, Albert TJ, Zwick ME. Microarray-based genomic selection for high-throughput resequencing. Nat Methods. 2007;4:907–909. doi: 10.1038/nmeth1109. - DOI - PubMed
    1. Porreca GJ, Zhang K, Li JB, Xie B, Austin D, Vassallo SL, LeProust EM, Peck BJ, Emig CJ, Dahl F, Gao Y, Church GM, Shendure J. Multiplex amplification of large sets of human exons. Nat Methods. 2007;4:931–936. doi: 10.1038/nmeth1110. - DOI - PubMed

Publication types

LinkOut - more resources