This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 May 8:arXiv:2505.05612v1.

scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction

Qing Wang¹, Yining Pan¹, Minghao Zhou¹, Zijia Tang², Yanfei Wang¹, Guangyu Wang^{3

4}, Qianqian Song¹

Affiliations

¹ Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32611, USA.
² Trinity College, Duke University, Durham, NC, USA.
³ Center for Bioinformatics and Computational Biology, Houston Methodist Research Institute, Houston, TX, USA.
⁴ Department of Cardiothoracic Surgery, Weill Cornell Medicine, Cornell University, New York, NY, USA.

PMID: 40386575
PMCID: PMC12083700

scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction

Qing Wang et al. ArXiv. 2025.

[Preprint]. 2025 May 8:arXiv:2505.05612v1.

Authors

Qing Wang¹, Yining Pan¹, Minghao Zhou¹, Zijia Tang², Yanfei Wang¹, Guangyu Wang^{3

4}, Qianqian Song¹

Affiliations

¹ Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32611, USA.
² Trinity College, Duke University, Durham, NC, USA.
³ Center for Bioinformatics and Computational Biology, Houston Methodist Research Institute, Houston, TX, USA.
⁴ Department of Cardiothoracic Surgery, Weill Cornell Medicine, Cornell University, New York, NY, USA.

PMID: 40386575
PMCID: PMC12083700

Abstract

Drug resistance remains a significant barrier to improving the effectiveness of cancer therapies. To better understand the biological mechanisms driving resistance, single-cell profiling has emerged as a powerful tool for characterizing cellular heterogeneity. Recent advancements in large-scale foundation models have demonstrated potential in enhancing single-cell analysis, yet their performance in drug response prediction remains underexplored. In this study, we developed scDrugMap, an integrated framework for drug response prediction that features both a Python command-line tool and an interactive web server. scDrugMap supports the evaluation of a wide range of foundation models, including eight single-cell foundation models and two large language models (LLMs), using large-scale single-cell datasets across diverse tissue types, cancer types, and treatment regimens. The framework incorporates a curated data resource consisting of a primary collection of 326,751 cells from 36 datasets across 23 studies, and a validation collection of 18,856 cells from 17 datasets across 6 studies. Using scDrugMap, we conducted comprehensive benchmarking under two evaluation scenarios: pooled-data evaluation and cross-data evaluation. In both settings, we implemented two model training strategies-layer freezing and fine-tuning using Low-Rank Adaptation (LoRA) of foundation models. In the pooled-data evaluation, scFoundation outperformed all others, while most models achieved competitive performance. Specifically, scFoundation achieved the highest mean F1 scores of 0.971 and 0.947 using layer-freezing and fine-tuning, outperforming the lowest-performing model by 54% and 57%, respectively. In the cross-data evaluation, UCE achieved the highest performance (mean F1 score: 0.774) after fine-tuning on tumor tissue, while scGPT demonstrated superior performance (mean F1 score: 0.858) in a zero-shot learning setting. Together, this study presents the first comprehensive benchmarking of large-scale foundation models for drug response prediction in single-cell data and introduces a user-friendly, flexible platform to support drug discovery and translational research.

Keywords: Computational Drug Discovery; Drug Resistance; Drug Response Prediction; Foundation Models; Low-Rank Adaptation; Single-cell Profiling; Zero-shot Learning; scDrugMap.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

**Fig. 1. Overview of the scDrugMap framework and curated datasets.**
**(a)** Schematic of the scDrugMap framework, which integrates a benchmarking platform, computational pipeline, interactive web server, and curated drug response datasets. Users can select from a range of foundation models (FMs), including single-cell-specific foundation models (scFMs) and general-purpose language models, and apply different training strategies (layer-freezing or fine-tuning) to predict drug response outcomes (sensitive or resistant). **(b)** Categories used for benchmarking model performance, including tissue types, drug types, and cancer types. **(c)** Summary of the primary dataset collection, showing the number of datasets (left) and number of cells (right) across tissue types, drug types, and cancer types. **(d)** Summary of the validation dataset collection, with the number of datasets (left) and number of cells (right) across tissue types, drug types, and cancer types.

**Fig. 2. Model performance in predicting drug response in pool-data evaluation using primary single-cell data.**
F1 scores across tissue, drug, and cancer types using a) layer-freezing and b) fine-tuning training method. Error bars on the bar plots / dots on the vertical line charts represent standard deviation of the mean F1 score of each method in each category.

**Fig. 3. Model performance in predicting drug response in cross-data evaluation using primary single-cell data.**
F1 scores across tissue, drug, and cancer types using a) layer-freezing and b) fine-tuning training method. Radar plots represent mean F1 scores for different tissue, drug types, and regimens (the radial axis is scaled from 0–1). Violin plots represent the kernel density distribution and the box plots inside represent the median (center line), upper and lower quartiles and 1.5× the interquartile range (whiskers) for all the cancer types. In the circular bar charts, each color segment showed the mean F1 score of the corresponding category across tissue, drug type, and regimen category. In the boxplots, the middle line is the median, the lower and upper hinges correspond to the first and third quartiles, the upper whisker extends from the hinge to the largest value no further than 1.5× the inter-quartile range (IQR) from the hinge, and the lower whisker extends from the hinge to the smallest value at most 1.5× IQR of the hinge.

**Fig. 4. UMAP projection of primary single-cell data by different methods.**
UMAP embeddings using layer freezing training method are shown for a) scFoundation b) scGPT and c) UCE, with cells colored cancer type and cell response.

**Fig. 5. Performance of GPT4o-mini with few-shots learning in pool-data evaluation using primary single-cell data.**
a) Prompt display used for GPT4o-mini. For the prompt word template, we first use a technique like the thought chain to prompt the model how it should think about the output, then tell the model the data source and sequence information, and finally we repeat telling the model and give an output template to ensure the consistent of the output format. A complete input example with prompt is also showed. b) Radar plots showing the mean F1 scores of GPT4o-mini in predicting drug response across different tissue, drug types, and cancer types (the radial axis is scaled from 0–1).

**Fig. 6. Model performance in predicting drug response in pool-data evaluation using validation single-cell data.**
Radar plots illustrate mean a) F1 scores and b) AUROC of each model in predicting single-cell drug response using layer-freezing training method across different tissue, drug types, and cancer types (the radial axis is scaled from 0–1).

**Fig. 7. Summary of properties, computational efficiency, and scalability of each evaluated model.**
Rows correspond to algorithms ordered chronologically by year and months of publication. The first three columns display model characteristics: whether it uses an encoder-decoder architecture, the type of input embeddings, and whether it is a single-cell foundation model. The next two columns present parameters and output dimensions for each model. The next set of columns show the training and inference time and speed. For each model, the color in each cell is proportional to the corresponding value (scaled between corresponding minimum and maximum values, ignoring values of the two natural language models, shown as dashes).

See this image and copyright information in PMC

References

1. Li Y., Wang Z., Ajani J.A. & Song S. Drug resistance and Cancer stem cells. Cell Commun Signal 19, 19 (2021). - PMC - PubMed
1. Chen E.Y., Raghunathan V. & Prasad V. An Overview of Cancer Drugs Approved by the US Food and Drug Administration Based on the Surrogate End Point of Response Rate. JAMA Intern Med 179, 915–921 (2019). - PMC - PubMed
1. Schwaederle M. et al. Association of Biomarker-Based Treatment Strategies With Response Rates and Progression-Free Survival in Refractory Malignant Neoplasms: A Meta-analysis. JAMA Oncol 2, 1452–1459 (2016). - PubMed
1. Su C. Emerging insights to lung cancer drug resistance. Cancer Drug Resist 5, 534–540 (2022). - PMC - PubMed
1. Oliver L. et al. Drug resistance in glioblastoma: are persisters the key to therapy? Cancer Drug Resist 3, 287–301 (2020). - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction

Affiliations

scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources