Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 6:7:12989.
doi: 10.1038/ncomms12989.

A high-quality human reference panel reveals the complexity and distribution of genomic structural variants

Collaborators, Affiliations

A high-quality human reference panel reveals the complexity and distribution of genomic structural variants

Jayne Y Hehir-Kwa et al. Nat Commun. .

Abstract

Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Overviews of discovery approach and variant set.
(a) Overview of methods used for SV detection, genotyping and phasing within the GoNL project. (b) Structural variation consensus set, consisting of large duplications (outer ring), deletions larger than 100 bp (light red), chromosomes, insertions (triangles), mid-sized deletions (21–100 bp), small deletions (less than 20 bp) (dark red) and complex indels (purple). Heatmaps display the insertions of Alu, L1 and SVA elements. Inversions are indicated by black arcs in the centre of the plot, and interchromosomal break points (colored based on the source chromosome).
Figure 2
Figure 2. Number of simple and complex indels, mobile element insertions (MEIs) and deletions (stratified by length).
Grey bars correspond to total counts, whereas coloured (blue to violet) bars give counts stratified into four bins by allele frequency quartiles (Q1 to Q3).
Figure 3
Figure 3. Example of a large replacement within the KRBOX4 gene.
The plot depicts the coverage profile of whole genome sequencing reads from a GoNL sample with a homozygous replacement. The lack of coverage in the last exon of KRBOX4 is coinciding with the position of the replacement. The breakpoint junctions of the replacement are indicated in the panel underneath the coverage plot.
Figure 4
Figure 4. Identification and expression of a novel ZNF gene.
(a) A Geuvadis RNA-sequencing dataset (ERR188316) was mapped to the human reference genome, which was extended with a new genomic segment inserted in chr 19 (bp 21,252,967). The plot shows RNA expression and split-read mappings across the novel ZNF gene present on this new genomic segment. (b) Protein domain structure of the novel ZNF gene as determined using NCBI Conserved Domain Search. (c) Neighbor-joining tree built from alignment of protein sequences homologous to the novel ZNF gene. Values at the nodes indicate bootstrap support of each group. Distances indicate protein sequence divergence on amino acid level.
Figure 5
Figure 5. Effects of MEIs on gene expression.
(a) Schematic picture indicating an AluYa5 insertion in the promoter region of LCLAT1. (b) LCLAT1 gene expression (log2 of normalized read count) in blood from GoNL individuals who are heterozygous (het) or homozygous (hom) for the AluYa5 insertion. (c) RNA expression effects of an AluYb8 insertion in the last exon of ZNF880. The presence of the AluYb8 element results in spliced transcripts, which preferentially contain the last exon, while the before last exon is skipped (upper panel). The reverse effect is seen in the absence of the AluYb8 insertion (lower panel).
Figure 6
Figure 6. Schematic overview of the imputation experiment.
Haplotypes are represented by thin grey bars, whereas diploid chromosomes with genotype calls are indicated by thick grey bars. Processing steps are shown in blue, with numbers (in black circles) for being referenced in the main text.
Figure 7
Figure 7. Imputation results for different SV types.
(a) Histogram on the number of gold standard genotype calls per SV class. (b) Relationship between discordance and fraction of missing genotypes when altering the genotype likelihood (GL) threshold used for filtering the imputed genotypes, ranging from 0.33 (no filter) to 0.999 across SV classes. Thresholds used for further analyses, including panels (c,d), are circled in red. Increasing the minimum GL results in fewer discordant genotypes but increases the number of missing genotypes. Imputation of inversions had the highest rate of discordance and missing genotypes, whereas the tandem duplications and deletions had lower rates of discordant and missing genotypes for those events with a high GL. (c) Discordance rates for deletions, complex indels and MEIs stratified by minor allele frequencies for 20 bins (width=0.025). Bin boundaries are indicated by grey lines. The number of calls per bin are shown by dashed lines. (d) Same as (c), but restricted calls where the gold standard genotype contains at least one copy of the rare allele.

References

    1. Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014). - PubMed
    1. Deelen P. et al. Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands'. Eur. J. Hum. Genet. 22, 1321–1326 (2014). - PMC - PubMed
    1. International HapMap Consortium. The international hapmap project. Nature 426, 789–796 (2003). - PubMed
    1. International HapMap 3 Consortium. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010). - PMC - PubMed
    1. Conrad D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010). - PMC - PubMed

Publication types