Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

doi:10.1093/bioinformatics/btae700

. 2024 Nov 28;40(12):btae700.

doi: 10.1093/bioinformatics/btae700.

Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

Malte Kuehl^{1

2

3

4}, Milagros N Wong^{1

2

5

6}, Nicola Wanner^{5

6}, Stefan Bonn^{3

4}, Victor G Puelles^{1

2

5

6}

Affiliations

¹ Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, Aarhus N, Midtjylland, 8200, Denmark.
² Department of Pathology, Aarhus University Hospital, Palle Juul-Jensens Boulevard 69, Aarhus N, Midtjylland, 8200, Denmark.
³ Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Falkenried 94, Hamburg, Hamburg, 20251, Germany.
⁴ Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, Martinistraße 52, Hamburg, Hamburg, 20246, Germany.
⁵ III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Martinistraße 52, Hamburg, Hamburg, 20246, Germany.
⁶ Hamburg Center for Kidney Health, University Medical Center Hamburg-Eppendorf, Martinistraße 52, Hamburg, Hamburg, 20246, Germany.

PMID: 39565903
PMCID: PMC11629965
DOI: 10.1093/bioinformatics/btae700

Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

Malte Kuehl et al. Bioinformatics. 2024.

. 2024 Nov 28;40(12):btae700.

doi: 10.1093/bioinformatics/btae700.

Authors

Malte Kuehl^{1

2

3

4}, Milagros N Wong^{1

2

5

6}, Nicola Wanner^{5

6}, Stefan Bonn^{3

4}, Victor G Puelles^{1

2

5

6}

Affiliations

¹ Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, Aarhus N, Midtjylland, 8200, Denmark.
² Department of Pathology, Aarhus University Hospital, Palle Juul-Jensens Boulevard 69, Aarhus N, Midtjylland, 8200, Denmark.
³ Institute of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Falkenried 94, Hamburg, Hamburg, 20251, Germany.
⁴ Center for Biomedical AI, University Medical Center Hamburg-Eppendorf, Martinistraße 52, Hamburg, Hamburg, 20246, Germany.
⁵ III. Department of Medicine, University Medical Center Hamburg-Eppendorf, Martinistraße 52, Hamburg, Hamburg, 20246, Germany.
⁶ Hamburg Center for Kidney Health, University Medical Center Hamburg-Eppendorf, Martinistraße 52, Hamburg, Hamburg, 20246, Germany.

PMID: 39565903
PMCID: PMC11629965
DOI: 10.1093/bioinformatics/btae700

Abstract

Summary: Transcript quantification tools efficiently map bulk RNA sequencing (RNA-seq) reads to reference transcriptomes. However, their output consists of transcript count estimates that are subject to multiple biases and cannot be readily used with existing differential gene expression analysis tools in Python.Here we present pytximport, a Python implementation of the tximport R package that supports a variety of input formats, different modes of bias correction, inferential replicates, gene-level summarization of transcript counts, transcript-level exports, transcript-to-gene mapping generation, and optional filtering of transcripts by biotype. pytximport is part of the scverse ecosystem of open-source Python software packages for omics analyses and includes both a Python as well as a command-line interface.With pytximport, we propose a bulk RNA-seq analysis workflow based on Bioconda and scverse ecosystem packages, ensuring reproducible analyses through Snakemake rules. We apply this pipeline to a publicly available RNA-seq dataset, demonstrating how pytximport enables the creation of Python-centric workflows capable of providing insights into transcriptomic alterations.

Availability and implementation: pytximport is licensed under the GNU General Public License version 3. The source code is available at https://github.com/complextissue/pytximport and via Zenodo with DOI: 10.5281/zenodo.13907917. A related Snakemake workflow is available through GitHub at https://github.com/complextissue/snakemake-bulk-rna-seq-workflow and Zenodo with DOI: 10.5281/zenodo.12713811. Documentation and a vignette for new users are available at: https://pytximport.readthedocs.io.

PubMed Disclaimer

Figures

**Figure 1.**
Overview of the pytximport package and its associated RNA-seq workflow. (a) pytximport package. pytximport is available for use as a Python library or from the command line. It can be configured to either output AnnData objects for integration with other scverse ecosystem software or xarray datasets. Common applications for pytximport include gene count estimation from transcript quantification files, isoform-usage bias correction, filtering of transcript-level data, and creation of transcript-to-gene mappings. (b) Pythonic RNA-seq analysis workflow. We propose a reproducible RNA-seq analysis workflow based on command-line software available through Bioconda (left line: Snakemake, fastp, Salmon) and scverse ecosystem Python packages (right line: pytximport, PyDESeq2, decoupleR). (c) Comparison with tximport. Counts from pytximport match counts from tximport exactly across different quantification modes and input files from different transcript quantification tools. RSEM-g: RSEM gene-level input; RSEM-t: RSEM transcript-level input. The Python logo is in the public domain and was provided through Bioicons. The AnnData logo is licensed under the BSD 3-Clause License. The xarray logo is provided under the Apache License Version 2.0. The Snakemake and PyDESeq2 logos are licensed under the MIT License. The Salmon and decoupleR logos are licensed under the GNU General Public License Version 3.

See this image and copyright information in PMC

Cited by

Pathology-oriented multiplexing enables integrative disease mapping.
Kuehl M, Okabayashi Y, Wong MN, Gernhold L, Gut G, Kaiser N, Schwerk M, Gräfe SK, Ma FY, Tanevski J, Schäfer PSL, Mezher S, Sarabia Del Castillo J, Goldbeck-Strieder T, Zolotareva O, Hartung M, Delgado Chaves FM, Klinkert L, Gnirck AC, Spehr M, Fleck D, Joodaki M, Parra V, Shaigan M, Diebold M, Prinz M, Kranz J, Kux JM, Braun F, Kretz O, Wu H, Grahammer F, Heins S, Zimmermann M, Haas F, Kylies D, Wanner N, Czogalla J, Dumoulin B, Zolotarev N, Lindenmeyer M, Karlson P, Nyengaard JR, Sebode M, Weidemann S, Wiech T, Groene HJ, Tomas NM, Meyer-Schwesinger C, Kuppe C, Kramann R, Karras A, Bruneval P, Tharaux PL, Pastene D, Yard B, Schaub JA, McCown PJ, Pyle L, Choi YJ, Yokoo T, Baumbach J, Sáez PJ, Costa I, Turner JE, Hodgin JB, Saez-Rodriguez J, Huber TB, Bjornstad P, Kretzler M, Lenoir O, Nikolic-Paterson DJ, Pelkmans L, Bonn S, Puelles VG. Kuehl M, et al. Nature. 2025 Aug;644(8076):516-526. doi: 10.1038/s41586-025-09225-2. Epub 2025 Jul 18. Nature. 2025. PMID: 40681898 Free PMC article.

References

1. Badia-I-Mompel P, Vélez Santiago J, Braunger J. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform Adv 2022;2:vbac016. 10.1093/bioadv/vbac016 - DOI - PMC - PubMed
1. Bray NL, Pimentel H, Melsted P. et al. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016;34:525–7. 10.1038/nbt.3519 - DOI - PubMed
1. Chen S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2023;2:e107. 10.1002/imt2.107 - DOI - PMC - PubMed
1. Harrison PW, Amode MR, Austine-Orimoloye O. et al. Ensembl 2024. Nucleic Acids Res 2024;52:D891–D899. 10.1093/nar/gkad1049 - DOI - PMC - PubMed
1. He D, Soneson C, Patro R. Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing. bioRxiv, 10.1101/2023.01.04.522742, 2023, preprint: not peer reviewed. - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Badia-I-Mompel P, Vélez Santiago J, Braunger J. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform Adv 2022;2:vbac016. 10.1093/bioadv/vbac016 - DOI - PMC - PubMed

[2] Badia-I-Mompel P, Vélez Santiago J, Braunger J. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform Adv 2022;2:vbac016. 10.1093/bioadv/vbac016 - DOI - PMC - PubMed

[3] Bray NL, Pimentel H, Melsted P. et al. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016;34:525–7. 10.1038/nbt.3519 - DOI - PubMed

[4] Bray NL, Pimentel H, Melsted P. et al. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016;34:525–7. 10.1038/nbt.3519 - DOI - PubMed

[5] Chen S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2023;2:e107. 10.1002/imt2.107 - DOI - PMC - PubMed

[6] Chen S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2023;2:e107. 10.1002/imt2.107 - DOI - PMC - PubMed

[7] Harrison PW, Amode MR, Austine-Orimoloye O. et al. Ensembl 2024. Nucleic Acids Res 2024;52:D891–D899. 10.1093/nar/gkad1049 - DOI - PMC - PubMed

[8] Harrison PW, Amode MR, Austine-Orimoloye O. et al. Ensembl 2024. Nucleic Acids Res 2024;52:D891–D899. 10.1093/nar/gkad1049 - DOI - PMC - PubMed

[9] He D, Soneson C, Patro R. Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing. bioRxiv, 10.1101/2023.01.04.522742, 2023, preprint: not peer reviewed. - DOI

[10] He D, Soneson C, Patro R. Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing. bioRxiv, 10.1101/2023.01.04.522742, 2023, preprint: not peer reviewed. - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

Affiliations

Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources