Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 2;15(1):6516.
doi: 10.1038/s41467-024-50613-5.

Evaluating batch correction methods for image-based cell profiling

Affiliations

Evaluating batch correction methods for image-based cell profiling

John Arevalo et al. Nat Commun. .

Abstract

High-throughput image-based profiling platforms are powerful technologies capable of collecting data from billions of cells exposed to thousands of perturbations in a time- and cost-effective manner. Therefore, image-based profiling data has been increasingly used for diverse biological applications, such as predicting drug mechanism of action or gene function. However, batch effects severely limit community-wide efforts to integrate and interpret image-based profiling data collected across different laboratories and equipment. To address this problem, we benchmark ten high-performing single-cell RNA sequencing (scRNA-seq) batch correction techniques, representing diverse approaches, using a newly released Cell Painting dataset, JUMP. We focus on five scenarios with varying complexity, ranging from batches prepared in a single lab over time to batches imaged using different microscopes in multiple labs. We find that Harmony and Seurat RPCA are noteworthy, consistently ranking among the top three methods for all tested scenarios while maintaining computational efficiency. Our proposed framework, benchmark, and metrics can be used to assess new batch correction methods in the future. This work paves the way for improvements that enable the community to make the best use of public Cell Painting data for scientific discovery.

PubMed Disclaimer

Conflict of interest statement

The Authors declare the following competing interests: S.S. and A.E.C. serve as scientific advisors for companies that use image-based profiling and Cell Painting (A.E.C: Recursion, SyzOnc, Quiver Bioscience, S.S.: Waypoint Bio, Dewpoint Therapeutics, Deepcell) and receive honoraria for occasional scientific visits to pharmaceutical and biotechnology companies. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Evaluation pipeline.
We evaluated five image-based profiling scenarios with different image acquisition equipment (high-throughput microscopes), laboratory, number of compounds and number of replicates. We used a state-of-the-art pipeline for image analysis. We compared ten batch correction methods using qualitative and quantitative metrics.
Fig. 2
Fig. 2. Evaluation scenario 1.
Quantitative comparison of ten batch correction methods measuring batch effect removal (four batch correction metrics) and conservation of biological variance (six bio-metrics). Metrics are mean aggregated by category. Overall score is the weighted sum of aggregated batch correction and bio-metrics with 0.4 and 0.6 weights respectively.
Fig. 3
Fig. 3. Evaluation scenario 2.
Quantitative comparison of ten batch correction methods measuring batch effect removal (four batch correction metrics) and conservation of biological variance (six bio-metrics). Metrics are mean aggregated by category. Overall score is the weighted sum of aggregated batch correction and bio-metrics with 0.4 and 0.6 weights respectively.
Fig. 4
Fig. 4. Evaluation scenario 3.
Quantitative comparison of ten batch correction methods measuring batch effect removal (four batch correction metrics) and conservation of biological variance (six bio-metrics). Metrics are mean aggregated by category. Overall score is the weighted sum of aggregated batch correction and bio-metrics with 0.4 and 0.6 weights respectively.
Fig. 5
Fig. 5. Evaluation scenario 4.
A Quantitative comparison of ten batch correction methods measuring batch effect removal (four batch correction metrics) and conservation of biological variance (six bio-metrics). Metrics are mean aggregated by category. Overall score is the weighted sum of aggregated batch correction and bio-metrics with 0.4 and 0.6 weights respectively. Visualization of integrated data colored by B Compound, C Laboratory, and D Microscope. Left-to-right layout reflects the methods’ descending order of performance. We selected 18 out of 302 compounds with replicates in different well positions to account for position effects that may cause profiles to look similar; the embeddings are the same across B-D but samples treated with compounds other than the selected 18 are not shown in B. Alphanumeric IDs denote positive controls. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Evaluation scenario 5.
Quantitative comparison of ten batch correction methods measuring batch effect removal (four batch correction metrics) and conservation of biological variance (six bio-metrics). Metrics are mean aggregated by category. Overall score is the weighted sum of aggregated batch correction and bio-metrics with 0.4 and 0.6 weights respectively.

Update of

References

    1. Chandrasekaran, S. N., Ceulemans, H., Boyd, J. D. & Carpenter, A. E. Image-based profiling for drug discovery: due for a machine-learning upgrade? Nat. Rev. Drug Discov. 20, 145–159 (2021) - PMC - PubMed
    1. Carreras-Puigvert, J. & Spjuth, O. Artificial intelligence for high content imaging in drug discovery. Curr. Opin. Struct. Biol.87, 102842 (2024). 10.1016/j.sbi.2024.102842 - DOI - PubMed
    1. Cimini B. A. et al. Optimizing the Cell Painting assay for image-based profiling. Nat. Protoc.18, 1981–2013 (2023). - PMC - PubMed
    1. Gustafsdottir, S. M. et al. Multiplex cytological profiling assay to measure diverse cellular states. PLoS One8, e80999 (2013). 10.1371/journal.pone.0080999 - DOI - PMC - PubMed
    1. Wawer M. J. et al. Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling. Proc. Natl. Acad. Sci. USA. 111, 10911–10916 (2014). - PMC - PubMed

MeSH terms

LinkOut - more resources