pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods
- PMID: 38057718
- PMCID: PMC10701943
- DOI: 10.1186/s12859-023-05578-5
pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods
Abstract
Background: Variability in datasets is not only the product of biological processes: they are also the product of technical biases. ComBat and ComBat-Seq are among the most widely used tools for correcting those technical biases, called batch effects, in, respectively, microarray and RNA-Seq expression data.
Results: In this technical note, we present a new Python implementation of ComBat and ComBat-Seq. While the mathematical framework is strictly the same, we show here that our implementations: (i) have similar results in terms of batch effects correction; (ii) are as fast or faster than the original implementations in R and; (iii) offer new tools for the bioinformatics community to participate in its development. pyComBat is implemented in the Python language and is distributed under GPL-3.0 ( https://www.gnu.org/licenses/gpl-3.0.en.html ) license as a module of the inmoose package. Source code is available at https://github.com/epigenelabs/inmoose and Python package at https://pypi.org/project/inmoose .
Conclusions: We present a new Python implementation of state-of-the-art tools ComBat and ComBat-Seq for the correction of batch effects in microarray and RNA-Seq data. This new implementation, based on the same mathematical frameworks as ComBat and ComBat-Seq, offers similar power for batch effect correction, at reduced computational cost.
Keywords: Batch effects; Bayesian statistics; Open source; Transcriptomics.
© 2023. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures



Similar articles
-
Differential expression analysis with inmoose, the integrated multi-omic open-source environment in Python.BMC Bioinformatics. 2025 Jun 23;26(1):160. doi: 10.1186/s12859-025-06180-7. BMC Bioinformatics. 2025. PMID: 40551108 Free PMC article.
-
bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data.Bioinformatics. 2020 Feb 15;36(4):1174-1181. doi: 10.1093/bioinformatics/btz726. Bioinformatics. 2020. PMID: 31584606 Free PMC article.
-
pyrpipe: a Python package for RNA-Seq workflows.NAR Genom Bioinform. 2021 Jun 1;3(2):lqab049. doi: 10.1093/nargab/lqab049. eCollection 2021 Jun. NAR Genom Bioinform. 2021. PMID: 34085037 Free PMC article.
-
Scedar: A scalable Python package for single-cell RNA-seq exploratory data analysis.PLoS Comput Biol. 2020 Apr 27;16(4):e1007794. doi: 10.1371/journal.pcbi.1007794. eCollection 2020 Apr. PLoS Comput Biol. 2020. PMID: 32339163 Free PMC article.
-
GReNaDIne: A Data-Driven Python Library to Infer Gene Regulatory Networks from Gene Expression Data.Genes (Basel). 2023 Jan 20;14(2):269. doi: 10.3390/genes14020269. Genes (Basel). 2023. PMID: 36833196 Free PMC article.
Cited by
-
Multi-centre radiomics for prediction of recurrence following radical radiotherapy for head and neck cancers: Consequences of feature selection, machine learning classifiers and batch-effect harmonization.Phys Imaging Radiat Oncol. 2023 May 16;26:100450. doi: 10.1016/j.phro.2023.100450. eCollection 2023 Apr. Phys Imaging Radiat Oncol. 2023. PMID: 37260438 Free PMC article.
-
Machine learning for normal tissue complication probability prediction: Predictive power with versatility and easy implementation.Clin Transl Radiat Oncol. 2023 Feb 10;39:100595. doi: 10.1016/j.ctro.2023.100595. eCollection 2023 Mar. Clin Transl Radiat Oncol. 2023. PMID: 36880063 Free PMC article.
-
Differentiation between descending thoracic aortic diseases using machine learning and plasma proteomic signatures.Clin Proteomics. 2024 Jun 2;21(1):38. doi: 10.1186/s12014-024-09487-4. Clin Proteomics. 2024. PMID: 38825704 Free PMC article.
-
DNA methylation shapes the Polycomb landscape during the exit from naive pluripotency.Nat Struct Mol Biol. 2025 Feb;32(2):346-357. doi: 10.1038/s41594-024-01405-4. Epub 2024 Oct 24. Nat Struct Mol Biol. 2025. PMID: 39448850
-
Acellular, bioresorbable, ultra-purified alginate gel implantation for intervertebral disc herniation: Phase 1/2, open-label, non-randomized clinical trials.Nat Commun. 2025 May 8;16(1):4285. doi: 10.1038/s41467-025-59715-0. Nat Commun. 2025. PMID: 40341039 Free PMC article. Clinical Trial.
References
-
- Tai YC, Speed TP. A multivariate empirical Bayes statistic for replicated microarray time course data. Ann Stat. 2006;34(5):2387–2412. doi: 10.1214/009053606000000759. - DOI
MeSH terms
Grants and funding
- 190185351/European Union's Horizon 2020 research and innovation program
- 190185351/European Union's Horizon 2020 research and innovation program
- 190185351/European Union's Horizon 2020 research and innovation program
- 190185351/European Union's Horizon 2020 research and innovation program
- 190185351/European Union's Horizon 2020 research and innovation program
LinkOut - more resources
Full Text Sources
Research Materials