Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 23;26(1):393.
doi: 10.1186/s12864-025-11600-2.

Benchmarking foundation cell models for post-perturbation RNA-seq prediction

Affiliations

Benchmarking foundation cell models for post-perturbation RNA-seq prediction

Gerold Csendes et al. BMC Genomics. .

Abstract

Accurately predicting cellular responses to perturbations is essential for understanding cell behaviour in both healthy and diseased states. While perturbation data is ideal for building such predictive models, its availability is considerably lower than baseline (non-perturbed) cellular data. To address this limitation, several foundation cell models have been developed using large-scale single-cell gene expression data. These models are fine-tuned after pre-training for specific tasks, such as predicting post-perturbation gene expression profiles, and are considered state-of-the-art for these problems. However, proper benchmarking of these models remains an unsolved challenge. In this study, we benchmarked two recently published foundation models, scGPT and scFoundation, against baseline models. Surprisingly, we found that even the simplest baseline model-taking the mean of training examples-outperformed scGPT and scFoundation. Furthermore, basic machine learning models that incorporate biologically meaningful features outperformed scGPT by a large margin. Additionally, we identified that the current Perturb-Seq benchmark datasets exhibit low perturbation-specific variance, making them suboptimal for evaluating such models. Our results highlight important limitations in current benchmarking approaches and provide insights into more effectively evaluating post-perturbation gene expression prediction models.

Keywords: Benchmark; Foundaton model; Perturbation; RNA-seq.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Competing interests: All authors are full-time employees of Turbine Ltd., KSz is a founder as well.

Figures

Fig. 1
Fig. 1
Benchmarking foundation and baseline models (A) Schematic representation of benchmark pipeline (B) Evaluation on the Adamson dataset: Pearson delta metrics (y axis) for scGPT, scFoundation, Train Mean and Random Forest Regression model with different features (x axis). Main groups of models a colour coded. C Evaluation on the Norman dataset (D) Evaluation on the Replogle K562 dataset (E) Evaluation on the Replogle RPE1 dataset
Fig. 2
Fig. 2
Composition of standard benchmark datasets (A) Number of single cells (control/perturbed, colour code) in the four benchmark datasets. y axis is log10 scaled (B) Number of distinct perturbations in the four benchmark datasets (C) Correlation heatmaps for pseudo-bulk differential expression signatures for Adamson (left), and Replogle K562 (right) datasets. The black lines indicate the separation between training and test sets. Some samples (perturbations) are labelled on the x and y axes. D Distribution of pairwise Pearson correlations (y axis) of differential expression profiles for the benchmark datasets (x axis). E Comparison between median intra-dataset correlations (x axis) and Pearson Delta metrics difference between best and Train Mean models (y axis), colour coded by the benchmark dataset

References

    1. Montagud A, Béal J, Tobalina L, Traynard P, Subramanian V, Szalai B, et al. Patient-specific Boolean models of signalling networks guide personalised treatments. Elife. 2022;11:e72626. - PMC - PubMed
    1. Molinelli EJ, Korkut A, Wang W, Miller ML, Gauthier NP, Jing X, et al. Perturbation biology: inferring signaling networks in cellular systems. PLoS Comput Biol. 2013;9:e1003290. - PMC - PubMed
    1. Yuan B, Shen C, Luna A, Korkut A, Marks DS, Ingraham J, et al. CellBox: interpretable machine learning for perturbation biology with application to the design of cancer combination therapy. Cell Syst. 2021;12:128-140.e4. - PubMed
    1. Theodoris CV, Xiao L, Chopra A, Chaffin MD, Al Sayed ZR, Hill MC, et al. Transfer learning enables predictions in network biology. Nature. 2023;618:616–24. - PMC - PubMed
    1. Roohani Y, Huang K, Leskovec J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat Biotechnol. 2023. 10.1038/s41587-023-01905-6. - PMC - PubMed

LinkOut - more resources