Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 1;40(2):btae088.
doi: 10.1093/bioinformatics/btae088.

Bioframe: operations on genomic intervals in Pandas dataframes

Affiliations

Bioframe: operations on genomic intervals in Pandas dataframes

Open2C et al. Bioinformatics. .

Abstract

Motivation: Genomic intervals are one of the most prevalent data structures in computational genome biology, and used to represent features ranging from genes, to DNA binding sites, to disease variants. Operations on genomic intervals provide a language for asking questions about relationships between features. While there are excellent interval arithmetic tools for the command line, they are not smoothly integrated into Python, one of the most popular general-purpose computational and visualization environments.

Results: Bioframe is a library to enable flexible and performant operations on genomic interval dataframes in Python. Bioframe extends the Python data science stack to use cases for computational genome biology by building directly on top of two of the most commonly-used Python libraries, NumPy and Pandas. The bioframe API enables flexible name and column orders, and decouples operations from data formats to avoid unnecessary conversions, a common scourge for bioinformaticians. Bioframe achieves these goals while maintaining high performance and a rich set of features.

Availability and implementation: Bioframe is open-source under MIT license, cross-platform, and can be installed from the Python Package Index. The source code is maintained by Open2C on GitHub at https://github.com/open2c/bioframe.

PubMed Disclaimer

Conflict of interest statement

No competing interest is declared.

Figures

Figure 1.
Figure 1.
Performance comparison of bioframe v0.6.1, PyRanges v0.0.129, and pybedtools v0.9.1 (bedtools v2.30.0) for detecting overlapping intervals between pairs of DataFrames of randomly generated genomic intervals. (A) Run time and (B) Peak memory consumption of bioframe overlap vs. PyRanges join show comparable performance up to millions of intervals and comparable memory usage. Pybedtools intersect shows slower performance. Code for this performance comparison is available at https://bioframe.readthedocs.io/en/latest/guide-performance.html.

Similar articles

Cited by

References

    1. Akalin A, Franke V, Vlahoviček K. et al. Genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics 2015;31:1127–9. - PubMed
    1. Dale RK, Pedersen BS, Quinlan AR.. Pybedtools: a flexible python library for manipulating genomic datasets and annotations. Bioinformatics 2011;27:3423–4. - PMC - PubMed
    1. den Bossche JV, Jordahl K, Fleischmann M. et al. geopandas/geopandas: v0.14.3. 2024. 10.5281/zenodo.2585848 - DOI
    1. Harris CR, Millman KJ, van der Walt SJ. et al. Array programming with NumPy. Nature 2020;585:357–62. - PMC - PubMed
    1. Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng 2007;9:90–5.

Publication types