Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 11;16(1):286.
doi: 10.1186/s12864-015-1479-3.

Assessing structural variation in a personal genome-towards a human reference diploid genome

Affiliations

Assessing structural variation in a personal genome-towards a human reference diploid genome

Adam C English et al. BMC Genomics. .

Abstract

Background: Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.

Results: We demonstrate Parliament's efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus.

Conclusions: HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Parliament workflows. The Parliament infrastructure is designed to incorporate multiple data types and software for each data type. (a) Novel Method evaluation incorporates new data or methods to the HS1011 workflow. (b) The HS1011 workflow. (c) The Illumina Only workflow, requiring only a paired-end WGS BAM file as input.
Figure 2
Figure 2
Size distribution. All HS1011 SV events larger than 100 bp and less than 100,000 bp were compared to events from the Venter genome (HuRef) and an Asian Male (YH), both specifically characterized for SV content. In this size regime, the HS1011, HuRef, and YH samples contain 5044, 5127, and 5374 deletions (panel a) and 4482, 4479, and 15525 insertions (panel b), respectively. The YH SV distributions are based on de novo assembly of 35 bp single-end and paired end data. This assembly was used to identify SVs between 1 bp and 50 kbp. Initial events larger than 50 bp were filtered using discordant paired-end mapping of ~35 bp reads. Given the relative abundance of HS1011 sequence data (including both long reads and longer short reads as compared to the YH short reads), and given the differences in methods, it is unlikely that the ~3-fold difference in insertions between the YH set and the HS1011 and HuRef sets represents a significant lack of Parliament sensitivity.
Figure 3
Figure 3
DGV comparison. Each of the 31,007 reference-inconsistent loci was characterized as either an HS1011 SV or unsupported locus based on its Parliament bitflag and as either “In DGV” or “Not DGV” based on whether it shared at least 50% reciprocal overlap with a DGV event of the same type.
Figure 4
Figure 4
SNP concordance. HomozScores are reported for three classes of HS1011 deletion loci: unsupported loci, HS1011 SVs with less than 25X coverage, HS1011 SVs with greater than 50X coverage.
Figure 5
Figure 5
Illumina-Only & PacBio comparison. The Illumina only results are compared to the HS1011 SV subset containing Illumina and PacBio discovery. PB-ILL contains all HS1011 SVs with PacBio or Illumina discovery and hybrid assembly support. The ILLHyb workflow uses only PE methods for discovery but both Illumina and PacBio sequence reads for local assembly. The ILLOnly workflow uses only Illumina PE methods and reads for both discovery and assembly.
Figure 6
Figure 6
Multi-source comparison. Each cell contains the number of clusters with support from a pair of sources. The diagonal entries describe clusters with support with exactly one data source.
Figure 7
Figure 7
Complex rearrangement. A representation of a large-scale deletion and inverted insertion rearrangement on chromosome 11 p15.5 is depicted. Through de novo assembly, the rearrangement breakpoint junctions (Jct 1, 2, and 3) were identified, and the resultant structure in the genome of HS1011 was found to be as depicted. Below are shown the junction sequences of the three breakpoints.

References

    1. Boerwinkle E, Heckbert SR. Following-Up Genome-Wide Association Study Signals Lessons Learned From Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium Targeted Sequencing Study. Circ Cardiovasc Genet. 2014;7:332–4. doi: 10.1161/CIRCGENETICS.113.000078. - DOI - PMC - PubMed
    1. Karaca E, Weitzer S, Pehlivan D, Shiraishi H, Gogakos T, Hanada T, et al. Human CLP1 mutations alter tRNA biogenesis, affecting both peripheral and central nervous system function. Cell. 2014;157:636–50. doi: 10.1016/j.cell.2014.02.058. - DOI - PMC - PubMed
    1. Yang Y, Muzny DM, Reid JG, Bainbridge MN, Willis A, Ward PA, et al. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med. 2013;369:1502–11. doi: 10.1056/NEJMoa1306555. - DOI - PMC - PubMed
    1. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, et al. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464:704–12. doi: 10.1038/nature08516. - DOI - PMC - PubMed
    1. Stankiewicz P, Lupski JR. Structural Variation in the Human Genome and its Role in Disease. Annu Rev Med. 2010;61:437–55. doi: 10.1146/annurev-med-100708-204735. - DOI - PubMed

Publication types

LinkOut - more resources