Leveraging multi-source data to resolve inconsistency across pharmacogenomic datasets in drug sensitivity prediction
- PMID: 41726490
- PMCID: PMC12919631
Leveraging multi-source data to resolve inconsistency across pharmacogenomic datasets in drug sensitivity prediction
Abstract
Researchers have developed pharmacogenomics datasets for various purposes, such as biomarker identification, yet drug response prediction models often underperform due to dataset inconsistencies. These variations arise from inter-tumoral heterogeneity, experimental conditions, and cell subtype complexity, limiting model generalizability. To address this, we propose a computational model based on Aggregated Learning (AL) to enhance drug response prediction by learning from inconsistencies across multiple datasets. Our model minimizes discrepancies by training on overlapping inconsistent data points from three pharmacogenomic datasets-CCLE, GDSC2, and gCSI. Compared to four baseline methods-Selecting Better (SB), Result Average (RA), Combining Data (CD), and Model Average (MA)-our approach achieved superior performance with lower Mean Absolute Error (MAE) scores: 0.090 (CCLE-GDSC), 0.096 (CCLE-gCSI), and 0.081 (GDSC-gCSI). These results demonstrate that addressing inconsistencies enhances prediction accuracy and generalizability, making our model a promising solution for robust drug response predictions.
©2024 AMIA - All rights reserved.
Figures
References
-
- Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA: A Cancer Journal for Clinicians. 2022;72(1) doi:10.3322/caac.21708.
-
- Garraway LA, Verweij J, Ballman KV. Precision Oncology: An Overview. Journal of Clinical Oncology Official Journal of the American Society of Clinical Oncology. 2013;31(15):1803–5. doi:10.1200/JCO.2013.49.4799. - PubMed
-
- Chawla S, Rockstroh A, Lehman ML, Ratther E, Jain A, Anand A, et al. Gene expression based inference of cancer drug sensitivity. Nature Communications. 2022:13. doi:10.1038/s41467-022-33291-z.
-
- Paltun BG, Mamitsuka H, Kaski S. Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches. Briefings in Bioinformatics. 2019;22:346–59. doi:10.1093/bib/bbz153.