Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 4;102(1):142-155.
doi: 10.1016/j.ajhg.2017.12.007.

A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data

Affiliations

A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data

Brett Trost et al. Am J Hum Genet. .

Abstract

A remaining hurdle to whole-genome sequencing (WGS) becoming a first-tier genetic test has been accurate detection of copy-number variations (CNVs). Here, we used several datasets to empirically develop a detailed workflow for identifying germline CNVs >1 kb from short-read WGS data using read depth-based algorithms. Our workflow is comprehensive in that it addresses all stages of the CNV-detection process, including DNA library preparation, sequencing, quality control, reference mapping, and computational CNV identification. We used our workflow to detect rare, genic CNVs in individuals with autism spectrum disorder (ASD), and 120/120 such CNVs tested using orthogonal methods were successfully confirmed. We also identified 71 putative genic de novo CNVs in this cohort, which had a confirmation rate of 70%; the remainder were incorrectly identified as de novo due to false positives in the proband (7%) or parental false negatives (23%). In individuals with an ASD diagnosis in which both microarray and WGS experiments were performed, our workflow detected all clinically relevant CNVs identified by microarrays, as well as additional potentially pathogenic CNVs < 20 kb. Thus, CNVs of clinical relevance can be discovered from WGS with a detection rate exceeding microarrays, positioning WGS as a single assay for genetic variation detection.

Keywords: CNV; SV; WGS; copy-number variation; read depth; structural variation; variation detection; whole-genome sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the Three Stages of This Study In stage 1 (“algorithm selection”), three WGS datasets and corresponding CNV benchmarks (HuRef,, , NA12878, and AK135) were used to assess the accuracy of six read depth-based CNV-detection algorithms—Canvas, cn.MOPS, CNVnator, ERDS, Genome STRiP, and RDXplorer. In stage 2 (“workflow development”), other factors influencing CNV detection were evaluated in the context of the most accurate algorithms identified in stage 1. Based on results from the first two stages, we propose a comprehensive workflow for detecting CNVs from short-read WGS data. In stage 3 (“workflow evaluation”), we show that our workflow can accurately identify clinically relevant CNVs. Green parallelograms represent data, and gray rectangles represent actions. The blue shape represents the CNV detection workflow developed from the results of the first two stages.
Figure 2
Figure 2
Overlap in the CNVs Detected by the Six Algorithms The bottom-left bar chart shows the number of CNVs identified by each algorithm. The remainder shows the number of CNVs detected by various intersections of the algorithms; for instance, the far-left bar for deletions represents the number of CNVs detected by RDXplorer only, while the far-right bar represents deletions detected by Canvas, cn.MOPS, CNVnator, and RDXplorer but not ERDS or Genome STRiP. Due to the log scale, zero-height bars represent a count of 1.
Figure 3
Figure 3
Recommended Workflow for Use of Read Depth-Based Algorithms for Detecting Germline CNVs from Short-Read WGS Data The green and blue shapes represent the beginning and end of the workflow, respectively. Red rectangles represent quality-control steps, and other actions are colored in gray. Yellow diamonds represent decision points. For maximum stringency, the action “Remove CNVs with ≥70% overlap with RLCRs” may be performed using the full RLCR definition, including RepeatMasker (as in the algorithm selection and workflow development sections). For increased sensitivity, such as when examining rare, genic CNVs, it may be performed using the RLCR definition that omits RepeatMasker, as was done in the workflow evaluation section.

References

    1. Zarrei M., MacDonald J.R., Merico D., Scherer S.W. A copy number variation map of the human genome. Nat. Rev. Genet. 2015;16:172–183. - PubMed
    1. Feuk L., Carson A.R., Scherer S.W. Structural variation in the human genome. Nat. Rev. Genet. 2006;7:85–97. - PubMed
    1. Levy S., Sutton G., Ng P.C., Feuk L., Halpern A.L., Walenz B.P., Axelrod N., Huang J., Kirkness E.F., Denisov G. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. - PMC - PubMed
    1. Pang A.W., MacDonald J.R., Pinto D., Wei J., Rafiq M.A., Conrad D.F., Park H., Hurles M.E., Lee C., Venter J.C. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 2010;11:R52. - PMC - PubMed
    1. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R., 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. - PMC - PubMed

Publication types

Grants and funding