PyRanges: efficient comparison of genomic intervals in Python
- PMID: 31373614
- DOI: 10.1093/bioinformatics/btz615
PyRanges: efficient comparison of genomic intervals in Python
Abstract
Summary: Complex genomic analyses often use sequences of simple set operations like intersection, overlap and nearest on genomic intervals. These operations, coupled with some custom programming, allow a wide range of analyses to be performed. To this end, we have written PyRanges, a data structure for representing and manipulating genomic intervals and their associated data in Python. Run single threaded on binary set operations, PyRanges is in median 2.3-9.6 times faster than the popular R GenomicRanges library and is equally memory efficient; run multi-threaded on 8 cores, our library is up to 123 times faster. PyRanges is therefore ideally suited both for individual analyses and as a foundation for future genomic libraries in Python.
Availability and implementation: PyRanges is available as open source under the MIT license at https://github.com/biocore-NTNU/pyranges and the documentation exists at https://biocore-NTNU.github.io/pyranges/.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Similar articles
-
Bioframe: operations on genomic intervals in Pandas dataframes.Bioinformatics. 2024 Feb 1;40(2):btae088. doi: 10.1093/bioinformatics/btae088. Bioinformatics. 2024. PMID: 38402507 Free PMC article.
-
Gos: a declarative library for interactive genomics visualization in Python.Bioinformatics. 2023 Jan 1;39(1):btad050. doi: 10.1093/bioinformatics/btad050. Bioinformatics. 2023. PMID: 36688709 Free PMC article.
-
Pybedtools: a flexible Python library for manipulating genomic datasets and annotations.Bioinformatics. 2011 Dec 15;27(24):3423-4. doi: 10.1093/bioinformatics/btr539. Epub 2011 Sep 23. Bioinformatics. 2011. PMID: 21949271 Free PMC article.
-
Pygenomics: manipulating genomic intervals and data files in Python.Bioinformatics. 2023 Jun 1;39(6):btad346. doi: 10.1093/bioinformatics/btad346. Bioinformatics. 2023. PMID: 37228014 Free PMC article.
-
DNA Features Viewer: a sequence annotation formatting and plotting library for Python.Bioinformatics. 2020 Aug 1;36(15):4350-4352. doi: 10.1093/bioinformatics/btaa213. Bioinformatics. 2020. PMID: 32637988
Cited by
-
Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes.Nat Commun. 2022 Sep 10;13(1):5332. doi: 10.1038/s41467-022-32864-2. Nat Commun. 2022. PMID: 36088354 Free PMC article.
-
junctionCounts: comprehensive alternative splicing analysis and prediction of isoform-level impacts to the coding sequence.NAR Genom Bioinform. 2024 Aug 9;6(3):lqae093. doi: 10.1093/nargab/lqae093. eCollection 2024 Sep. NAR Genom Bioinform. 2024. PMID: 39131822 Free PMC article.
-
Bioframe: operations on genomic intervals in Pandas dataframes.Bioinformatics. 2024 Feb 1;40(2):btae088. doi: 10.1093/bioinformatics/btae088. Bioinformatics. 2024. PMID: 38402507 Free PMC article.
-
Aberrant splicing prediction across human tissues.Nat Genet. 2023 May;55(5):861-870. doi: 10.1038/s41588-023-01373-3. Epub 2023 May 4. Nat Genet. 2023. PMID: 37142848
-
Probabilistic Mixture Models Improve Calibration of Panel-derived Tumor Mutational Burden in the Context of both Tumor-normal and Tumor-only Sequencing.Cancer Res Commun. 2023 Mar 28;3(3):501-509. doi: 10.1158/2767-9764.CRC-22-0339. eCollection 2023 Mar. Cancer Res Commun. 2023. PMID: 36999044 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources