Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 22:2024:744-753.
eCollection 2024.

Leveraging multi-source data to resolve inconsistency across pharmacogenomic datasets in drug sensitivity prediction

Affiliations

Leveraging multi-source data to resolve inconsistency across pharmacogenomic datasets in drug sensitivity prediction

Xiaodi Li et al. AMIA Annu Symp Proc. .

Abstract

Researchers have developed pharmacogenomics datasets for various purposes, such as biomarker identification, yet drug response prediction models often underperform due to dataset inconsistencies. These variations arise from inter-tumoral heterogeneity, experimental conditions, and cell subtype complexity, limiting model generalizability. To address this, we propose a computational model based on Aggregated Learning (AL) to enhance drug response prediction by learning from inconsistencies across multiple datasets. Our model minimizes discrepancies by training on overlapping inconsistent data points from three pharmacogenomic datasets-CCLE, GDSC2, and gCSI. Compared to four baseline methods-Selecting Better (SB), Result Average (RA), Combining Data (CD), and Model Average (MA)-our approach achieved superior performance with lower Mean Absolute Error (MAE) scores: 0.090 (CCLE-GDSC), 0.096 (CCLE-gCSI), and 0.081 (GDSC-gCSI). These results demonstrate that addressing inconsistencies enhances prediction accuracy and generalizability, making our model a promising solution for robust drug response predictions.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The framework for the drug sensitivity prediction model. (1) We collect the drug, cell line, and gene sensitivity scores from three core datasets: GDSC, CCLE, and gCSI. (2) We preprocess these datasets and use them for the embedding input for the models. (3) We train the basic learning models MBL1 and MBL2 using all the data from Dataset 1 and Dataset 2, respectively. We also train the inconsistency aggregation models MIA1 and MBL1 by using the overlapping data and the sensitive scores generated by MBL2 and MIA1. For testing, we follow a similar process, but since we do not know the labels of the testing data, we output the average scores and calculate the MAE.
Figure 2.
Figure 2.
Cross-validation results on common drug-cell line pairs.
Figure 3.
Figure 3.
Cross-validation results on CCLE-GDSC with different numbers of non-overlapping samples (left to right). Please note that for gCSI-CCLE with changing gCSI, we cannot calculate 1.5:1 and 2:1 results due to a lack of enough inconsistent data points.
Figure 4.
Figure 4.
MAE after feature selection for different datasets.
Figure 5.
Figure 5.
MAE for different regression techniques.

References

    1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA: A Cancer Journal for Clinicians. 2022;72(1) doi:10.3322/caac.21708.
    1. Bodmer WF. Cancer genetics: colorectal cancer as a model. Journal of Human Genetics. 2006;51(5):391–6. doi:10.1007/s10038-006-0373-x. - PMC - PubMed
    1. Garraway LA, Verweij J, Ballman KV. Precision Oncology: An Overview. Journal of Clinical Oncology Official Journal of the American Society of Clinical Oncology. 2013;31(15):1803–5. doi:10.1200/JCO.2013.49.4799. - PubMed
    1. Chawla S, Rockstroh A, Lehman ML, Ratther E, Jain A, Anand A, et al. Gene expression based inference of cancer drug sensitivity. Nature Communications. 2022:13. doi:10.1038/s41467-022-33291-z.
    1. Paltun BG, Mamitsuka H, Kaski S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Briefings in Bioinformatics. 2019;22:346–59. doi:10.1093/bib/bbz153.