Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 12:12:184.
doi: 10.1186/1471-2164-12-184.

Accurate and exact CNV identification from targeted high-throughput sequence data

Affiliations

Accurate and exact CNV identification from targeted high-throughput sequence data

Alex S Nord et al. BMC Genomics. .

Abstract

Background: Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data.

Results: Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate.

Conclusions: Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Analysis Schema. The left side of the figure shows a flow chart summarizing our methods. The right panels illustrate specific processes in the identification of a hemizygous deletion where one edge of the CNV is within targeted sequence. a) Targeted region with short reads aligned to the reference sequence, ratio of normalized coverage for sample versus lane median, and CNV call based on depth of coverage analysis. b) Sequence reads that align to the CNV edge shown in a). Black reads align across the complete 76 bp. Red reads are shorter segments of 76 bp reads that align perfectly to the region and indicate presence of mutation. c) Reads that partially-map to the CNV edge will also align to sequence flanking the other edge and can be used for exact breakpoint characterization, despite the one edge being in non-targeted sequence. The gap in the alignments represents the deleted sequence.
Figure 2
Figure 2
Raw and normalized coverage data. Data for region of BRCA1 on chr17 where all samples represented are diploid. Mean and standard deviation (SD) for raw coverage across one lane (12 subjects) shown in top two panels. The third and fourth panels show signal-to-noise ratio for the raw and normalized data. Signal-to-noise was calculated as mean/SD for each base. The final panel shows mean and standard deviation for the ratio data from the 12 individuals across the region.
Figure 3
Figure 3
Ratio of sample to median corrected depth of coverage indicates variant regions. Each subplot shows ratio across one targeted region (PTEN, BRCA2, BRCA1, and CHEK2), with CNVs shown as colored datapoints. Using depth of coverage with map confirmation, we identified 10 CNVs (5 deletions (one homozygous), 4 duplications, and 1 triplication) across 21 targeted regions (909 kbp) for 96 barcoded samples. CNV size ranged from 31 bp to 26560 bp. Ratio calculated by comparing corrected normalized sample coverage to median coverage within one flow cell lane. Diploid bases are plotted in grey, while colored datapoints indicate copy-number variant bases for one sample. Non-targeted repeat sequence is shown in black at bottom of each plot.
Figure 4
Figure 4
Use of mapped partial reads to confirm calls and identify exact breakpoints. We tested for over-representation of tag start or end across predicted CNV breakpoint, and then mapped partial reads to identify exact breakpoints. We confirmed an 899 bp PTEN deletion present in two samples, a 510 bp deletion in BRCA1, and a 31 bp homozygous deletion in BRCA2 using this method. a) Unique breakpoint region for each CNV with sequence tags plotted by start and end position. Tags where all 76 bases align are shown in black, and tags where less than 76 bases align are shown in red. Z-scores generated based on the number of reads that start or end at each base are shown below the mapped reads with red indicating breakpoint(s). b) Each read that partially maps to the breakpoint aligns to sequence flanking the other side of the CNV, allowing exact breakpoint identification. Partial reads are shown in red, with a line connecting the two segments of each read. Length for all reads shown is 76 bp.

References

    1. Walsh T, Lee MK, Casadei S, Thornton AM, Stray SM, Pennil C, Nord AS, Mandell JB, Swisher EM, King M-C. Detection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing. Proc Natl Acad Sci USA. 2010;107(28):12629–12633. doi: 10.1073/pnas.1007983107. - DOI - PMC - PubMed
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. - DOI - PMC - PubMed
    1. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18(11):1851–1858. doi: 10.1101/gr.078212.108. - DOI - PMC - PubMed
    1. Medvedev P, Stanciu M, Brudno M. Computational methods for discovering structural variation with next-generation sequencing. Nat Methods. 2009;6(11 Suppl):S13–20. doi: 10.1038/nmeth.1374. - DOI - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR. et al.Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456(7218):53–59. doi: 10.1038/nature07517. - DOI - PMC - PubMed

Publication types

MeSH terms