Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 1;40(7):btae363.
doi: 10.1093/bioinformatics/btae363.

Pangenome graph layout by Path-Guided Stochastic Gradient Descent

Affiliations

Pangenome graph layout by Path-Guided Stochastic Gradient Descent

Simon Heumos et al. Bioinformatics. .

Abstract

Motivation: The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human-readable graph layout: a graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge.

Results: In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by SGD. We show that our implementation efficiently computes the low-dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features.

Availability and implementation: We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.

PubMed Disclaimer

Conflict of interest statement

J.H is employed by Computomics GmbH.

Figures

Figure 1.
Figure 1.
2D PG-SGD update operation sketches. (a) The path information of the graph. path1 and path2 both visit the same first node. Then their sequence diverges and they visit distinct nodes. (b–e) vi/vj or vi/vk is the current pair of nodes to update. ldij/ldik is the current layout distance. r,r is the current size of the update. (b) Initial graph layout highlighting the future update of the two nodes of path1. (c) The graph layout after the first update. The nodes appear longer now, because we updated at the end of the nodes. Highlighted is the future update of the two nodes of path2. (d) The graph layout after the second update. Highlighted is the future update of the two nodes of path1. (e) Final graph layout after three updates using the 2D PG-SGD.
Figure 2.
Figure 2.
2D visualizations of all chromosomes of the Human Pangenome Reference Consortium (HPRC) 90 haplotypes pangenome graph, chromosome 6, the major histocompatibility complex (MHC), and the complement component 4 (C4). (a) odgi draw layout of the HPRC pangenome graph 90 haplotypes. Displayed are all 24 autosomes and the mitochondrial chromosome. A red rectangle highlights chromosome 6 which is shown in the subfigure below. (b) gfaestus screenshot of the chromosome 6 layout. Colored in blue is the MHC. The hairball in the middle is the centromere. The black structures in the centromere are edges. (c) gfaestus screenshot of the MHC. All MHC genes are color annotated and the names of the genes appear as a text overlay. (d) gfaestus screenshot of the region around C4, specifically color highlighting genes C4A and C4B. The black lines are the edges of the graph.

Update of

References

    1. Ballouz S, Dobin A, Gillis JA. et al. Is it time to change the reference genome? Genome Biol 2019;20:159. - PMC - PubMed
    1. Cheong S-H, Si Y-W.. Force-directed algorithms for schematic drawings and placement: a survey. Inf Vis 2019;9:65–91.
    1. Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief Bioinform 2018;19:118–35. - PMC - PubMed
    1. Dabbaghie F, Srikakulam SK, Marschall T. et al. PanPA: generation and alignment of panproteome graphs. Bioinformatics 2023;3:vbad167. - PMC - PubMed
    1. Eizenga JM, Novak AM, Sibbesen JA. et al. Pangenome graphs. Annu Rev Genomics Hum Genet 2020;21:139–62. - PMC - PubMed