Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov;55(11):735-743.
doi: 10.1136/jmedgenet-2018-105272. Epub 2018 Jul 30.

Whole-genome sequencing analysis of CNV using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis

Affiliations

Whole-genome sequencing analysis of CNV using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis

Bo Zhou et al. J Med Genet. 2018 Nov.

Abstract

Background: Copy number variation (CNV) analysis is an integral component of the study of human genomes in both research and clinical settings. Array-based CNV analysis is the current first-tier approach in clinical cytogenetics. Decreasing costs in high-throughput sequencing and cloud computing have opened doors for the development of sequencing-based CNV analysis pipelines with fast turnaround times. We carry out a systematic and quantitative comparative analysis for several low-coverage whole-genome sequencing (WGS) strategies to detect CNV in the human genome.

Methods: We compared the CNV detection capabilities of WGS strategies (short insert, 3 kb insert mate pair and 5 kb insert mate pair) each at 1×, 3× and 5× coverages relative to each other and to 17 currently used high-density oligonucleotide arrays. For benchmarking, we used a set of gold standard (GS) CNVs generated for the 1000 Genomes Project CEU subject NA12878.

Results: Overall, low-coverage WGS strategies detect drastically more GS CNVs compared with arrays and are accompanied with smaller percentages of CNV calls without validation. Furthermore, we show that WGS (at ≥1× coverage) is able to detect all seven GS deletion CNVs >100 kb in NA12878, whereas only one is detected by most arrays. Lastly, we show that the much larger 15 Mbp Cri du chat deletion can be readily detected with short-insert paired-end WGS at even just 1× coverage.

Conclusions: CNV analysis using low-coverage WGS is efficient and outperforms the array-based analysis that is currently used for clinical cytogenetics.

Keywords: array Cgh (acgh); copy-number variation (cnv); discordant read-pair analysis; mate-pair sequencing; read-depth analysis.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

Figure 1
Figure 1
Comparisons of CNV calls by whole-genome sequencing (WGS) and arrays. (A) Schematic diagram of detection of CNVs (deletions and duplications) using discordant read-pair analysis and read-depth analysis. Using discordant read-pair analysis, deletions are detected when the distance of alignment to the reference genome between read pairs are closer than the expected insert size of the library, and duplications are detected when the orientation of the aligned read pairs are inversed. Using read-depth analysis, deletions and duplications are detected when there is a pronounced decrease and increase, respectively, in alignments of reads spanning a genomic region relative to the average number of alignments over the genome. (B) numbers of autosomal CNVs in the genome of subject NA12878 called from short-insert, 3 kb mate-pair and 5 kb mate-pair libraries sequenced at 1×, 3× and 5× coverages compared against previous array calls. array-based CNV calls were made according to platform-specific algorithms, and WGS CNV calls were made by combining discordant read-pair analysis and read-depth analysis. gold: autosomal CNVs (overlap ≥50% reciprocally with NA12878 gold standard (GS) CNVs). green: 10%–50% reciprocal overlap with NA12878 GS CNVs. Blue: <10% reciprocal overlap with GS CNVs, ≥50% reciprocal overlap with NA12878 silver standard CNVs. red: no overlap (<10% overlap with GS CNVs and <50% overlap with silver standard CNVs). *Benchmarking was performed taking CNV type into account with the exception of affymetrix SnP 6.0 and illumina HumanOmni1Quad where CNV type information was not available.
Figure 2
Figure 2
Sensitivity of whole-genome sequencing (WGS) detection of NA12878 GS CNVs (>1 kb). (A) Sensitivity of (>1 kb) GS CNV detection across WGS libraries and array platforms as determined by the ratio of detected autosomal GS CNVs to total number of autosomal GS CNVS. array-based CNV calls were made according to platform-specific algorithms, and WGS CNV calls were made by combining discordant read-pair analysis and read-depth analysis. green: percentage of total autosomal GS CNVs detected (overlapping by >50% reciprocally). light Blue: percentage of total autosomal GS CNVs not detected (non-overlapping by >50% reciprocally). Sensitivity of GS CNV detection in different size ranges from (B) short-insert, (C) 3 kb mate-pair and (D) 5 kb mate-pair libraries at sequencing coverages 1×, 3× and 5×. CNVs were called by combining discordant read-pair analysis and read-depth analysis.
Figure 3
Figure 3
Size distributions of NA12878 detected by whole-genome sequencing (WGS). Size distribution of NA12878 CNVs detected from (A) short-insert, (B) 3 kb mate-pair and (C) 5 kb mate-pair libraries called by combining discordant read-pair analysis and read-depth analysis. CNVs called from (D) short-insert (E) 3 kb mate-pair and (F) 5 kb mate-pair libraries by read-depth analysis only. CNVs called from (G) short-insert, (H) 3 kb mate-pair and (I) 5 kb mate-pair libraries by discordant read-pair analysis only.
Figure 4
Figure 4
NA16595 cri du chat deletion. integrative genomics Viewer screenshot of the 15 Mbp cri du chat deletion on chromosome 5 in NA16595 by short-insert whole-genome sequencing (WGS) at 1×, 3× and 5× coverages. Vertical axis: coverage value, blue dots: respective coverage at genomic positions. two areas within the deletion show unusually high coverage due to overlap with segmental duplications resulting in cross-mapping of sequencing reads.

References

    1. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, Taillon BE, Chen Z, Tanzer A, Saunders AC, Chi J, Yang F, Carter NP, Hurles ME, Weissman SM, Harkins TT, Gerstein MB, Egholm M, Snyder M. Pairedend mapping reveals extensive structural variation in the human genome. Science 2007;318:420–6. - PMC - PubMed
    1. Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat Rev Genet 2009;10:241–51. - PubMed
    1. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 2010;467:1061–73. - PMC - PubMed
    1. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. 1000 Genomes Project Consortium. an integrated map of genetic variation from 1,092 human genomes. Nature 2012;491:56–65. - PMC - PubMed
    1. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR; 1000 Genomes Project Consortium. A Global reference for human genetic variation. Nature 2015;526:68–74. - PMC - PubMed

Publication types

MeSH terms