Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 15;28(10):1307-13.
doi: 10.1093/bioinformatics/bts146. Epub 2012 Apr 2.

CONTRA: copy number analysis for targeted resequencing

Affiliations

CONTRA: copy number analysis for targeted resequencing

Jason Li et al. Bioinformatics. .

Abstract

Motivation: In light of the increasing adoption of targeted resequencing (TR) as a cost-effective strategy to identify disease-causing variants, a robust method for copy number variation (CNV) analysis is needed to maximize the value of this promising technology.

Results: We present a method for CNV detection for TR data, including whole-exome capture data. Our method calls copy number gains and losses for each target region based on normalized depth of coverage. Our key strategies include the use of base-level log-ratios to remove GC-content bias, correction for an imbalanced library size effect on log-ratios, and the estimation of log-ratio variations via binning and interpolation. Our methods are made available via CONTRA (COpy Number Targeted Resequencing Analysis), a software package that takes standard alignment formats (BAM/SAM) and outputs in variant call format (VCF4.0), for easy integration with other next-generation sequencing analysis packages. We assessed our methods using samples from seven different target enrichment assays, and evaluated our results using simulated data and real germline data with known CNV genotypes.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
CONTRA workflow. Either a matched control sample (left arrow) or a pool of normal samples for creating a baseline control (right arrow) must be present.
Fig. 2.
Fig. 2.
Characteristics of base-level log-ratios. (A) Log-ratio versus GC-content; (B) log-ratio versus log2-coverage derived from two normal samples; and (C) effect of imbalanced library-size on log-ratios, for both simulated negative binomial data (left) and real data (right). The data points represent copy number neutrality. Top: library size of case sample is two times that of control; middle: equal size; bottom: case is half of control.
Fig. 3.
Fig. 3.
Comparison of log-ratio variations between matched control and pooled controls of varying number of samples, plotting log-ratio SD against log2 coverage. The same case sample has been used throughout. Control sample(s) are subset/superset of others.
Fig. 4.
Fig. 4.
Variation of DOC in TR. (A) Histogram of exon-level coverages in a single sample; (B) coverage profile along a chromosome (showing first 20 k targeted bp of Chromosome 1); (C) coverage versus GC-content; and (D) coverage versus distance from the first targeted base.
Fig. 5.
Fig. 5.
Coverage correlation between samples. (A) Log-ratio versus targeted base position along Chromosome 20, derived from pairs of random samples as indicated in the plot titles. E.g. top-left: log-ratios between two EZ Exome v2 samples; bottom-right: an EZ Exome v1 sample matched against a SureSelect v2 sample. See also Supplementary Figure S4. (B) Base-level coverage variance against coverage mean, using six random samples for each platform.
Fig. 6.
Fig. 6.
Receiver operating characteristics (ROC) curve for the HapMap samples, generated by varying CONTRA's P-value threshold. The middle table shows sensitivities and specificities for each individual sample at a P-value of 0.01.

References

    1. Abyzov A., et al. CNVnator: an approach to discover, genotype and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21:974–984. - PMC - PubMed
    1. Aird D., et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 2011;12:R18. - PMC - PubMed
    1. Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 1995;57:289–300.
    1. Boeva V., et al. Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics. 2011;27:268–269. - PMC - PubMed
    1. Campbell P.J., et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 2008;40:722–729. - PMC - PubMed

Publication types