Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 26;41(1):btae755.
doi: 10.1093/bioinformatics/btae755.

Gretl-variation GRaph Evaluation TooLkit

Affiliations

Gretl-variation GRaph Evaluation TooLkit

Sebastian Vorbrugg et al. Bioinformatics. .

Abstract

Motivation: As genome graphs are powerful data structures for representing the genetic diversity within populations, they can help identify genomic variations that traditional linear references miss, but their complexity and size makes the analysis of genome graphs challenging. We sought to develop a genome graph analysis tool that helps these analyses to become more accessible by addressing the limitations of existing tools. Specifically, we improve scalability and user-friendliness, and we provide many new statistics tailored to variation graphs for graph evaluation, including sample-specific features.

Results: We developed an efficient, comprehensive, and integrated tool, gretl, to analyze genome graphs and gain insights into their structure and composition by providing a wide range of statistics. gretl can be utilized to evaluate different graphs, compare the output of graph construction pipelines with different parameters, as well as perform an in-depth analysis of individual graphs, including sample-specific analysis. With the assistance of gretl, novel patterns of genetic variation and potential regions of interest can be identified, for later, more detailed inspection. We demonstrate that gretl outperforms other tools in terms of speed, particularly for larger genome graphs.

Availability and implementation: Commented Rust source code and documentation is available under MIT license at https://github.com/MoinSebi/gretl together with Python scripts and step-by-step usage examples. The package is available at Bioconda for easy installation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Gretl overview. (A) Genome graph construction workflow: genome graph properties are influenced by various factors, including parameter selection, sample curation, and methodology, all of which impact the layout and structure of the resulting genome graph. For evaluation purposes, multiple graphs can be simultaneously generated and compared to identify an optimal representation for a specific task. The selected graph can then be analyzed with gretl. (B) Visualization of gretl output: left, graphs can be clustered based on multiple statistics, grouping similar species or construction parameters (shown here, with normalized values). Right, scatter plot depicting two selected statistics across various graphs, facilitating comparisons between different species. (C) In-depth analysis of a selected genome graph (example from yeast): left, path-centric sliding window analysis of the Saccharomyces cerevisiae genome graph, highlighting regions of high similarity. Right, pan-genomic analysis of the genome graph. Sequences found only in a single sample are separated and each block represents one path of the graph.

Similar articles

  • Unbiased pangenome graphs.
    Garrison E, Guarracino A. Garrison E, et al. Bioinformatics. 2023 Jan 1;39(1):btac743. doi: 10.1093/bioinformatics/btac743. Bioinformatics. 2023. PMID: 36448683 Free PMC article.
  • ODGI: understanding pangenome graphs.
    Guarracino A, Heumos S, Nahnsen S, Prins P, Garrison E. Guarracino A, et al. Bioinformatics. 2022 Jun 27;38(13):3319-3326. doi: 10.1093/bioinformatics/btac308. Bioinformatics. 2022. PMID: 35552372 Free PMC article.
  • Efficient dynamic variation graphs.
    Eizenga JM, Novak AM, Kobayashi E, Villani F, Cisar C, Heumos S, Hickey G, Colonna V, Paten B, Garrison E. Eizenga JM, et al. Bioinformatics. 2021 Jan 29;36(21):5139-5144. doi: 10.1093/bioinformatics/btaa640. Bioinformatics. 2021. PMID: 33040146 Free PMC article.
  • Panacus: fast and exact pangenome growth and core size estimation.
    Parmigiani L, Garrison E, Stoye J, Marschall T, Doerr D. Parmigiani L, et al. Bioinformatics. 2024 Nov 28;40(12):btae720. doi: 10.1093/bioinformatics/btae720. Bioinformatics. 2024. PMID: 39626271 Free PMC article.
  • A survey of sequence-to-graph mapping algorithms in the pangenome era.
    Cui Y, Peng C, Xia Z, Yang C, Guo Y. Cui Y, et al. Genome Biol. 2025 May 22;26(1):138. doi: 10.1186/s13059-025-03606-6. Genome Biol. 2025. PMID: 40405275 Free PMC article. Review.

References

    1. The 1000 Genomes Project Consortium; Auton A, Abecasis GR, Altshuler DM. et al. A global reference for human genetic variation. Nature 2015;526:68–74. - PMC - PubMed
    1. 1001 Genomes Consortium. 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 2016;166:481–91. - PMC - PubMed
    1. Formenti G, Abueg L, Brajuka A. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics 2022;38:4214–6. - PMC - PubMed
    1. Garrison E, Guarracino A, Heumos S. et al. Building pangenome graphs. Nat Methods 2024;21:2008–12. - PubMed
    1. Garrison E, Sirén J, Novak AM. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875–9. - PMC - PubMed

Grants and funding