Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug;22(8):1525-32.
doi: 10.1101/gr.138115.112. Epub 2012 May 14.

Copy number variation detection and genotyping from exome sequence data

Affiliations

Copy number variation detection and genotyping from exome sequence data

Niklas Krumm et al. Genome Res. 2012 Aug.

Abstract

While exome sequencing is readily amenable to single-nucleotide variant discovery, the sparse and nonuniform nature of the exome capture reaction has hindered exome-based detection and characterization of genic copy number variation. We developed a novel method using singular value decomposition (SVD) normalization to discover rare genic copy number variants (CNVs) as well as genotype copy number polymorphic (CNP) loci with high sensitivity and specificity from exome sequencing data. We estimate the precision of our algorithm using 122 trios (366 exomes) and show that this method can be used to reliably predict (94% overall precision) both de novo and inherited rare CNVs involving three or more consecutive exons. We demonstrate that exome-based genotyping of CNPs strongly correlates with whole-genome data (median r(2) = 0.91), especially for loci with fewer than eight copies, and can estimate the absolute copy number of multi-allelic genes with high accuracy (78% call level). The resulting user-friendly computational pipeline, CoNIFER (copy number inference from exome reads), can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Method overview and CNV discovery. Exome sequencing reads from FASTQ files were divided into nonoverlapping 36-bp constituents (A) and aligned to targeted regions (B), allowing for up to two mismatches per 36-bp alignment. (C) For each exon or targeted region, we calculated RPKM values and then transformed these into “ZRPKM” values based on the median and standard deviation of each exon across all samples. (D) ZRPKM values were inputted into the SVD transformation, where we removed the first 12–15 singular values. Finally, a centrally weighted 15-exon average was passed over the SVD-ZRPKM values in order to reduce false positives, and a ±1.5 SVD-ZRPKM threshold was used to discover CNVs. (E) Final image shows ZRPKM values from 1000 consecutive exons on chromosome 16, plotted for 533 ESP exome background samples (black traces) and NA18507 (pink trace). Blue bar corresponds to a rare duplication in NA18507 at the METTL9/OTOA locus at chr16p12.2 that was validated by SNP microarray CNV analysis.
Figure 2.
Figure 2.
CNP locus genotyping of RHD and C4A. (A) SVD-transformed values for exons for the Rhesus deletion factor locus (RHD/RHCE) show distinct copy number states across both paralogous genes. (B) Histogram of average SVD-ZRPKM values for the ESP data set (533 individuals) and seven HapMap samples. Clustering was performed using an unsupervised algorithm (Supplemental Note). (C) Correlation between SVD-ZRPKM genotype values (y-axis) and absolute copy number estimate (x-axis) based on whole-genome read-depth for seven HapMap samples and experimentally validated by array-CGH. (D–F) Similar to above, for C4A locus.
Figure 3.
Figure 3.
Genotyping accuracy across 62 CNP loci. (A) Distribution of correlation coefficients of SVD-ZRPKM to whole-genome copy number estimate (Sudmant et al. 2010) across 62 CNP loci for seven HapMap samples, split by the median copy number of each locus. For loci with copy number less than eight, 32/39 had strong correlations between exome and whole-genome estimates, indicating that exome-based SVD-ZRPKM can be used to genotype such loci. (B) Results from unsupervised clustering algorithm for 43 autosomal loci for which genotype information was available (Campbell et al. 2011).

Similar articles

Cited by

References

    1. The 1000 Genomes Project Consortium 2010. A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073 - PMC - PubMed
    1. Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, et al. 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41: 1061–1067 - PMC - PubMed
    1. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, Shendure J 2011. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12: 745–755 - PubMed
    1. Campbell CD, Sampas N, Tsalenko A, Sudmant PH, Kidd JM, Malig M, Vu TH, Vives L, Tsang P, Bruhn L 2011. Population-genetic properties of differentiated human copy-number polymorphisms. Am J Hum Genet 88: 317–332 - PMC - PubMed
    1. Chiang DY, Getz G, Jaffe DB, O'Kelly MJT, Zhao X, Carter SL, Russ C, Nusbaum C, Meyerson M, Lander ES 2008. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods 6: 99–103 - PMC - PubMed

Publication types

LinkOut - more resources