Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 28;40(12):btae700.
doi: 10.1093/bioinformatics/btae700.

Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

Affiliations

Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python

Malte Kuehl et al. Bioinformatics. .

Abstract

Summary: Transcript quantification tools efficiently map bulk RNA sequencing (RNA-seq) reads to reference transcriptomes. However, their output consists of transcript count estimates that are subject to multiple biases and cannot be readily used with existing differential gene expression analysis tools in Python.Here we present pytximport, a Python implementation of the tximport R package that supports a variety of input formats, different modes of bias correction, inferential replicates, gene-level summarization of transcript counts, transcript-level exports, transcript-to-gene mapping generation, and optional filtering of transcripts by biotype. pytximport is part of the scverse ecosystem of open-source Python software packages for omics analyses and includes both a Python as well as a command-line interface.With pytximport, we propose a bulk RNA-seq analysis workflow based on Bioconda and scverse ecosystem packages, ensuring reproducible analyses through Snakemake rules. We apply this pipeline to a publicly available RNA-seq dataset, demonstrating how pytximport enables the creation of Python-centric workflows capable of providing insights into transcriptomic alterations.

Availability and implementation: pytximport is licensed under the GNU General Public License version 3. The source code is available at https://github.com/complextissue/pytximport and via Zenodo with DOI: 10.5281/zenodo.13907917. A related Snakemake workflow is available through GitHub at https://github.com/complextissue/snakemake-bulk-rna-seq-workflow and Zenodo with DOI: 10.5281/zenodo.12713811. Documentation and a vignette for new users are available at: https://pytximport.readthedocs.io.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the pytximport package and its associated RNA-seq workflow. (a) pytximport package. pytximport is available for use as a Python library or from the command line. It can be configured to either output AnnData objects for integration with other scverse ecosystem software or xarray datasets. Common applications for pytximport include gene count estimation from transcript quantification files, isoform-usage bias correction, filtering of transcript-level data, and creation of transcript-to-gene mappings. (b) Pythonic RNA-seq analysis workflow. We propose a reproducible RNA-seq analysis workflow based on command-line software available through Bioconda (left line: Snakemake, fastp, Salmon) and scverse ecosystem Python packages (right line: pytximport, PyDESeq2, decoupleR). (c) Comparison with tximport. Counts from pytximport match counts from tximport exactly across different quantification modes and input files from different transcript quantification tools. RSEM-g: RSEM gene-level input; RSEM-t: RSEM transcript-level input. The Python logo is in the public domain and was provided through Bioicons. The AnnData logo is licensed under the BSD 3-Clause License. The xarray logo is provided under the Apache License Version 2.0. The Snakemake and PyDESeq2 logos are licensed under the MIT License. The Salmon and decoupleR logos are licensed under the GNU General Public License Version 3.

Similar articles

Cited by

  • Pathology-oriented multiplexing enables integrative disease mapping.
    Kuehl M, Okabayashi Y, Wong MN, Gernhold L, Gut G, Kaiser N, Schwerk M, Gräfe SK, Ma FY, Tanevski J, Schäfer PSL, Mezher S, Sarabia Del Castillo J, Goldbeck-Strieder T, Zolotareva O, Hartung M, Delgado Chaves FM, Klinkert L, Gnirck AC, Spehr M, Fleck D, Joodaki M, Parra V, Shaigan M, Diebold M, Prinz M, Kranz J, Kux JM, Braun F, Kretz O, Wu H, Grahammer F, Heins S, Zimmermann M, Haas F, Kylies D, Wanner N, Czogalla J, Dumoulin B, Zolotarev N, Lindenmeyer M, Karlson P, Nyengaard JR, Sebode M, Weidemann S, Wiech T, Groene HJ, Tomas NM, Meyer-Schwesinger C, Kuppe C, Kramann R, Karras A, Bruneval P, Tharaux PL, Pastene D, Yard B, Schaub JA, McCown PJ, Pyle L, Choi YJ, Yokoo T, Baumbach J, Sáez PJ, Costa I, Turner JE, Hodgin JB, Saez-Rodriguez J, Huber TB, Bjornstad P, Kretzler M, Lenoir O, Nikolic-Paterson DJ, Pelkmans L, Bonn S, Puelles VG. Kuehl M, et al. Nature. 2025 Aug;644(8076):516-526. doi: 10.1038/s41586-025-09225-2. Epub 2025 Jul 18. Nature. 2025. PMID: 40681898 Free PMC article.

References

    1. Badia-I-Mompel P, Vélez Santiago J, Braunger J. et al. decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinform Adv 2022;2:vbac016. 10.1093/bioadv/vbac016 - DOI - PMC - PubMed
    1. Bray NL, Pimentel H, Melsted P. et al. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016;34:525–7. 10.1038/nbt.3519 - DOI - PubMed
    1. Chen S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2023;2:e107. 10.1002/imt2.107 - DOI - PMC - PubMed
    1. Harrison PW, Amode MR, Austine-Orimoloye O. et al. Ensembl 2024. Nucleic Acids Res 2024;52:D891–D899. 10.1093/nar/gkad1049 - DOI - PMC - PubMed
    1. He D, Soneson C, Patro R. Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing. bioRxiv, 10.1101/2023.01.04.522742, 2023, preprint: not peer reviewed. - DOI

Publication types