Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 7;24(1):459.
doi: 10.1186/s12859-023-05578-5.

pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods

Affiliations

pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods

Abdelkader Behdenna et al. BMC Bioinformatics. .

Abstract

Background: Variability in datasets is not only the product of biological processes: they are also the product of technical biases. ComBat and ComBat-Seq are among the most widely used tools for correcting those technical biases, called batch effects, in, respectively, microarray and RNA-Seq expression data.

Results: In this technical note, we present a new Python implementation of ComBat and ComBat-Seq. While the mathematical framework is strictly the same, we show here that our implementations: (i) have similar results in terms of batch effects correction; (ii) are as fast or faster than the original implementations in R and; (iii) offer new tools for the bioinformatics community to participate in its development. pyComBat is implemented in the Python language and is distributed under GPL-3.0 ( https://www.gnu.org/licenses/gpl-3.0.en.html ) license as a module of the inmoose package. Source code is available at https://github.com/epigenelabs/inmoose and Python package at https://pypi.org/project/inmoose .

Conclusions: We present a new Python implementation of state-of-the-art tools ComBat and ComBat-Seq for the correction of batch effects in microarray and RNA-Seq data. This new implementation, based on the same mathematical frameworks as ComBat and ComBat-Seq, offers similar power for batch effect correction, at reduced computational cost.

Keywords: Batch effects; Bayesian statistics; Open source; Transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Performance of pyComBat vs. Combat vs. Scanpy’s implementation of ComBat. A Distribution of the relative differences between the expression matrices corrected for batch effects, respectively by ComBat and pyComBat (parametric version), on the Ovarian Cancer dataset. The vertical dotted line corresponds to zero. B Computation time in seconds for pyComBat, Scanpy and ComBat for the parametric method, on the Multiple Myeloma dataset. The y-axis is in a log scale. C Computation time in seconds for pyComBat, Scanpy and ComBat for the parametric method, on the Ovarian Cancer dataset. The y-axis is in a log scale. D Computation time in minutes for pyComBat (left) and ComBat (right) for the non-parametric method, on the Ovarian Cancer dataset. The y-axis is in a log scale
Fig. 2
Fig. 2
Performance of pyComBat vs. ComBat-Seq. A Computation time in seconds for pyComBat and ComBat-Seq, on the Colon Cancer dataset. B Computation time in seconds for pyComBat and ComBat-Seq, on the Breast Cancer dataset
Fig. 3
Fig. 3
Differences of logFoldChange between ComBat and pyComBat corrected data for the Ovarian microarray dataset, for a differential expression analysis between primary tumor and normal tissue samples using Limma

Similar articles

Cited by

References

    1. Fare TL, Coffey EM, Dai H, He YD, Kessler DA, Kilian KA, et al. Effects of atmospheric ozone on microarray data quality. Anal Chem. 2003;75(17):4672–4675. doi: 10.1021/ac034241b. - DOI - PubMed
    1. Lander ES. Array of hope. Nat Genet. 1999;21(1 Suppl):3–4. doi: 10.1038/4427. - DOI - PubMed
    1. Tai YC, Speed TP. A multivariate empirical Bayes statistic for replicated microarray time course data. Ann Stat. 2006;34(5):2387–2412. doi: 10.1214/009053606000000759. - DOI
    1. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002;30(4):e15. doi: 10.1093/nar/30.4.e15. - DOI - PMC - PubMed
    1. Nielsen TO, West RB, Linn SC, Alter O, Knowling MA, O’Connell JX, et al. Molecular characterisation of soft tissue tumours: a gene expression study. Lancet Lond Engl. 2002;359(9314):1301–1307. doi: 10.1016/S0140-6736(02)08270-3. - DOI - PubMed