Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 30;24(1):274.
doi: 10.1186/s13059-023-03098-2.

Comparing methods for constructing and representing human pangenome graphs

Affiliations

Comparing methods for constructing and representing human pangenome graphs

Francesco Andreace et al. Genome Biol. .

Abstract

Background: As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs.

Results: In this work, we collect all publicly available high-quality human haplotypes and construct the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost, mdbg, Minigraph, Minigraph-Cactus and pggb. We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci.

Conclusion: This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application.

Keywords: Algorithms; Pangenomics; Sequence analysis; Variation graphs; de Bruijn graphs.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The complete pangenome construction scheme and visualization. A The overall workflow, using 5 different tools on 3 different datasets; B complete 104 haplotypes variation graph built by Minigraph; C focus on part of HLA (MHC) region in chromosome 6 from panel B; D focus on DRB1-5 locus of HLA from panel C; E, complete 10 haplotypes variation graph built with pggb; F 10 haplotypes variation graph built with Minigraph-Cactus; G 104 haplotypes pangenome mdbg; H 10 haplotypes Bifrost dBG. All graphs except those produced by Minigraph have been simplified using gfatools and rendered using Bandage. VG is for variation graph
Fig. 2
Fig. 2
Representations of the HLA-E locus by five graph construction methods over three increasing large human pangenomes. Nodes highlighted in red contain part of the locus sequence. The number of nodes and edges displayed below each graph concerns the whole subgraph (both red and gray nodes). Minigraph, on H2, H10, and H104, and mdbg, on H2, have only a portion of one node highlighted since the 4.8 kbp region is contained inside a single, long node
Fig. 3
Fig. 3
Representations of the complex HLA region by five graph construction methods over three increasing large human pangenomes. See caption of Fig. 2 for details

References

    1. Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, et al. The UCSC genome browser database: 2019 update. Nucleic Acids Res. 2019;47(D1):D853–D858. doi: 10.1093/nar/gky1095. - DOI - PMC - PubMed
    1. Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM, Dawson ET, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol. 2018;36(9):875–879. doi: 10.1038/nbt.4227. - DOI - PMC - PubMed
    1. Consortium TCPG Computational pan-genomics: status, promises and challenges. Brief Bioinforma. 2016;19(1):118–135. doi: 10.1093/bib/bbw089. - DOI - PMC - PubMed
    1. Sirén J, Monlong J, Chang X, Novak AM, Eizenga JM, Markello C, et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science. 2021;374(6574):abg8871. doi: 10.1126/science.abg8871. - DOI - PMC - PubMed
    1. Sherman RM, Salzberg SL. Pan-genomics in the human genome era. Nat Rev. 2020;Genet(21):243–254. 10.1038/s41576-020-0210-7. - PMC - PubMed

Publication types