Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 27;10(1):5402.
doi: 10.1038/s41467-019-13341-9.

GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

Affiliations

GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

Hannes P Eggertsson et al. Nat Commun. .

Abstract

Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.

PubMed Disclaimer

Conflict of interest statement

All authors are employees of deCODE Genetics/Amgen, Inc.

Figures

Fig. 1
Fig. 1
Overview of data structure and workflow. a Example structural variants and their encoding in an acyclic graph structure. b Workflow for constructing a GraphTyper graph with SNPs, indels and SVs. SVs are detected from each sample independently and then merged across all the samples, such that SV sites of the same type and similar position and size are reported only once. SNPs and indels that are given as input into the graph construction can be detected using GraphTyper or obtained from a database.
Fig. 2
Fig. 2
Comparisons to SVs in the syndip dataset. The breakpoint precision threshold is the maximum number of base-pairs we allowed at both breakpoints for an SV to be considered recalled. a Deletion recall comparison between SV genotyping methods. b Deletion false discovery rate comparison. The Manta and Manta + GraphTyper lines are overlapping. c Insertion recall comparison. Delly and smoove were not evaluated since they are not designed to discover all types of insertions. The Manta and Manta + GraphTyper lines are overlapping. d Insertion false discover rate comparison. e Deletion recall by deletion size with a breakpoint precision threshold of 50 bp. f Insertion recall by insertion size with a breakpoint precision threshold of 50 bp.
Fig. 3
Fig. 3
High-confidence SV genotypes in four Icelandic families. a Family tree of the four families. Shown are genotypes of a 313 bp deletion starting at chr20:19,080,772 (GRCh38). b Frequency distribution of SVs called on chromosome 20. There are 112 bins, the number of chromosomes in the callset. c The allele transmission rate of an SV from parent to offspring. For germline variants, the distribution is expected to be symmetric around 50%.
Fig. 4
Fig. 4
Overlap of previously published SV datasets and SVs we find in Iceland. a Fraction of SVs in an external SV dataset that are also found in Iceland. b Distribution of the number of insertions, deletions, and breakends of an external dataset that is found in Iceland. Maximum distance threshold used was 50 bp.

Similar articles

Cited by

References

    1. Gudbjartsson DF, et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 2015;47:435–444. doi: 10.1038/ng.3247. - DOI - PubMed
    1. Francioli LC, et al. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 2014;46:818–825. doi: 10.1038/ng.3021. - DOI - PubMed
    1. Auton A, et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Sudmant PH, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81. doi: 10.1038/nature15394. - DOI - PMC - PubMed
    1. Maretty L, et al. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature. 2017;548:87–91. doi: 10.1038/nature23264. - DOI - PubMed