Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 3:2025.02.27.640661.
doi: 10.1101/2025.02.27.640661.

SensitiveCancerGPT: Leveraging Generative Large Language Model on Structured Omics Data to Optimize Drug Sensitivity Prediction

Affiliations

SensitiveCancerGPT: Leveraging Generative Large Language Model on Structured Omics Data to Optimize Drug Sensitivity Prediction

Shaika Chowdhury et al. bioRxiv. .

Abstract

Objective: The fast accumulation of vast pharmacogenomics data of cancer cell lines provide unprecedented opportunities for drug sensitivity prediction (DSP), a crucial prerequisite for the advancement of precision oncology. Recently, Generative Large Language Models (LLM) have demonstrated performance and generalization prowess across diverse tasks in the field of natural language processing (NLP). However, the structured format of the pharmacogenomics data poses challenge for the utility of LLM in DSP. Therefore, the objective of this study is multi-fold: to adapt prompt engineering for structured pharmacogenomics data toward optimizing LLM's DSP performance, to evaluate LLM's generalization in real-world DSP scenarios, and to compare LLM's DSP performance against that of state-of-the-science baselines.

Methods: We systematically investigated the capability of the Generative Pre-trained Transformer (GPT) as a DSP model on four publicly available benchmark pharmacogenomics datasets, which are stratified by five cancer tissue types of cell lines and encompass both oncology and non-oncology drugs. Essentially, the predictive landscape of GPT is assessed for effectiveness on the DSP task via four learning paradigms: zero-shot learning, few-shot learning, fine-tuning and clustering pretrained embeddings. To facilitate GPT in seamlessly processing the structured pharmacogenomics data, domain-specific novel prompt engineering is employed by implementing three prompt templates (i.e., Instruction, Instruction-Prefix, Cloze) and integrating pharmacogenomics-related features into the prompt. We validated GPT's performance in diverse real-world DSP scenarios: cross-tissue generalization, blind tests, and analyses of drug-pathway associations and top sensitive/resistant cell lines. Furthermore, we conducted a comparative evaluation of GPT against multiple Transformer-based pretrained models and existing DSP baselines.

Results: Extensive experiments on the pharmacogenomics datasets across the five tissue cohorts demonstrate that fine-tuning GPT yields the best DSP performance (28% F1 increase, p-value= 0.0003) followed by clustering pretrained GPT embeddings (26% F1 increase, p-value= 0.0005), outperforming GPT in-context learning (i.e., few-shot). However, GPT in the zero-shot setting had a big F1 gap, resulting in the worst performance. Within the scope of prompt engineering, performance enhancement was achieved by directly instructing GPT about the DSP task and resorting to a concise context format (i.e., instruction-prefix), leading to F1 performance gain of 22% (p-value=0.02); while incorporation of drug-cell line prompt context derived from genomics and/or molecular features further boosted F1 score by 2%. Compared to state-of-the-science DSP baselines, GPT significantly asserted superior mean F1 performance (16% gain, p-value<0.05) on the GDSC dataset. In the cross-tissue analysis, GPT showcased comparable generalizability to the within-tissue performances on the GDSC and PRISM datasets, while statistically significant F1 performance improvements on the CCLE (8%, p-value=0.001) and DrugComb (19%, p-value=0.009) datasets. Evaluation on the challenging blind tests suggests GPT's competitiveness on the CCLE and DrugComb datasets compared to random splitting. Furthermore, analyses of the drug-pathway associations and log probabilities provided valuable insights that align with previous DSP findings.

Conclusion: The diverse experiment setups and in-depth analysis underscore the importance of generative LLM, such as GPT, as a viable in silico approach to guide precision oncology.

Availability: https://github.com/bioIKEA/SensitiveCancerGPT.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An overview of our proposed SensitiveCancerGPT framework. (A) (i) The statistics of the pharmacogenomics datasets presented as a nested pie plot. The total distribution of the cell line drug response within each dataset is shown as the innermost plot, the tissue distributions within each dataset as the middle plot and the feature distributions within each tissue cohort as the outermost plot. (ii) Workflow of prompt preparation from structured pharmacogenomics data. Using the column names and corresponding values in the tabular data, we first convert each row to a natural language text T. Note that the last column is left blank for the model to predict. We then prepare a task-specific instruction I and concatenate it with T to get the final prompt P. (B) To optimize GPT’s performance for the drug sensitivity prediction task (DSP), we evaluate on four different methodological factors: (i) learning approach, (ii) prompt template, (iii) feature and (iv) temperature. (C) We assess GPT’s generalization capability for DSP under diverse real-world experimental settings: (i) cross-tissue evaluation, (ii) blind tests, (iii) analyses of drug-pathway associations and top cell lines and (iv) baseline comparisons.
Figure 2:
Figure 2:
(A) Performance of GPT under different settings associated with the following methodological factors: (i) learning approach, (ii) prompt template, (iii) context and (iv) temperature. (B) Few-shot performance comparisons of datasets on varying the number of demonstrations, k, in increments of five from 1 to 15. (C) Performance comparisons between two different selection strategies under the few-shot setting.
Figure 3:
Figure 3:
Performance of GPT with optimal settings on five tissue cohorts across different pharmacogenomics datasets evaluated with (i) F1 (ii) F1-Sensitive and (iii) F1-Resistant. F1 is the micro-averaged F1 score and F1-Sensitive and F1-Resistant are the F1 scores for the positive and negative classes, respectively. We used scikit-learn (Pedregosa et al. 2011) for the computation of evaluation metrics.
Figure 4:
Figure 4:
Performance comparisons in the form of violin plots for the (A) analysis between within-tissue and cross-tissue evaluation settings and (B) blind test analyses for drugs and cell lines. In each type of analysis, GPT is evaluated on the five tissue cohorts associated with each dataset separately. So, the annotated mean (also shown as the red diamond shape) is computed by averaging the results across all tissues within a dataset. The degree of statistical significance is indicated by the number of asterisks and a not statistically significant difference by ‘ns’.
Figure 5:
Figure 5:
(A) Performance comparisons of GPT with two types of baseline models, previous drug response prediction models (PDS) and pretrained language models (PLM). GPT and the PLM baselines are fine-tuned. The F1 and per-class F1 reported are averaged across all five tissues’ evaluations across the four datasets for each model. Detailed results are available in Supplementary Figure 7 and Supplementary Figure 8. (B) F1-Sensitive performances of GPT and baselines with varying positive class distributions.
Figure 6:
Figure 6:
(A) Drug-pathway associations in the CCLE dataset. Negative (blue) and positive (red) correlations correspond to ‘assistant’ and ‘resistant’ associations, respectively. The p-values, annotated in red (i.e., p =), are computed using the Kolmogorov-Smirnov test and indicate distribution similarity between the predicted (subplot (ii)) and actual (subplot (ii)) drug-pathway associations. Note that the null hypothesis (p > 0.05) is that the two distributions are similar. (B (i)) Heatmap visualization of Tanimoto correlations between drugs. (ii) Few-shot classification performance using Tanimoto-based (i.e., structurally similar drugs) selection for in-context examples in the prompt.

References

    1. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. Ca Cancer J Clin. 2023. Jan 1;73(1):17–48. - PubMed
    1. Bedard PL, Hansen AR, Ratain MJ, Siu LL. Tumour heterogeneity in the clinic. Nature. 2013. Sep 19;501(7467):355–64. - PMC - PubMed
    1. Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates?. Nature reviews Drug discovery. 2004. Aug 1;3(8):711–6. - PubMed
    1. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, Reddy A. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012. Mar 29;483(7391):603–7. - PMC - PubMed
    1. Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research. 2012. Nov 22;41(D1):D955–61. - PMC - PubMed

Publication types

LinkOut - more resources