Big Data Approaches for Modeling Response and Resistance to Cancer Drugs

Peng Jiang¹, William R Sellers², X Shirley Liu¹

Affiliations

¹ Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, USA.
² Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.

PMID: 31342013
PMCID: PMC6655478
DOI: 10.1146/annurev-biodatasci-080917-013350

Big Data Approaches for Modeling Response and Resistance to Cancer Drugs

Peng Jiang et al. Annu Rev Biomed Data Sci. 2018 Jul.

. 2018 Jul:1:1-27.

doi: 10.1146/annurev-biodatasci-080917-013350. Epub 2018 Apr 25.

Authors

Peng Jiang¹, William R Sellers², X Shirley Liu¹

Affiliations

¹ Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, USA.
² Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.

PMID: 31342013
PMCID: PMC6655478
DOI: 10.1146/annurev-biodatasci-080917-013350

Abstract

Despite significant progress in cancer research, current standard-of-care drugs fail to cure many types of cancers. Hence, there is an urgent need to identify better predictive biomarkers and treatment regimes. Conventionally, insights from hypothesis-driven studies are the primary force for cancer biology and therapeutic discoveries. Recently, the rapid growth of big data resources, catalyzed by breakthroughs in high-throughput technologies, has resulted in a paradigm shift in cancer therapeutic research. The combination of computational methods and genomics data has led to several successful clinical applications. In this review, we focus on recent advances in data-driven methods to model anticancer drug efficacy, and we present the challenges and opportunities for data science in cancer therapeutic research.

Keywords: big data; combination therapy; drug resistance; immunotherapy; precision medicine; response biomarker; toxicity.

PubMed Disclaimer

Figures

**Figure 1**
Data-driven approaches for modeling cancer therapy efficacy. Most data-driven studies of anticancer drug efficacy involve four components: genomics technology, experimental model, computational method, and clinical application. The use of genomics technology in experimental models generates data that can be analyzed by computational methods to generate results for clinical applications. (a) Microarray and high-throughput sequencing are widely used to study the DNA alterations and RNA transcriptomes in cancer samples. Genetics screens through RNAi or CRISPR technologies can study the effect of perturbing a gene in a cell line model (174). Compound screens based on automation frameworks can test the efficacy of many drugs on a cell line panel (29, 35, 36). (b) The most clinically relevant system is human, where both tumor microenvironment (10, 12) and gut microbiota (17) can determine anticancer drug efficacy. However, genetic experiments cannot be directly applied to humans, so mouse models are used as alternatives to study in vivo factors of drug response (43, 175, 176). Cancer cell lines are the most widely used research models. Cell lines can be cultured alone or cocultured either between cancer and immune cells (–48) or between immune and bacteria cells (64, 69). (c) Most data analyses involve variable selection. Molecular alterations of genes across samples are input variables, and drug efficacy is the outcome (84). Variable selection methods can identify critical genes associated with anticancer drug efficacy. Clustering algorithms can be applied to identify patterns in a data set (115). Mathematical (97, 100) or network models (107) can be applied to explore the properties and mechanisms of a molecular circuit that mediate anticancer drug efficacy. (d) Many studies are designed to find biomarkers for therapy response prediction (177) or side effects (–136) in clinical applications using the molecular profiles of patient samples. Data-driven models can also be applied to identify synergistic drug combinations to treat specific cancers (84). Abbreviations: CRISPR, clustered regularly interspaced short palindromic repeats; NK, natural killer; MDSC, myeloid-derived suppressor cell; MΦ, macrophage; oligo, oligonucleotide; RNAi, RNA interference.

**Figure 2**
Compound screening in cancer cell lines. Automation frameworks can be utilized to test the growth inhibition effects of a library of compounds across many cancer cell lines with diverse genetic backgrounds. Most compound screen projects also profiled the molecular features (e.g., gene expression, copy number, mutation status) of cell lines. The final data output is the growth inhibition effects of compounds on cell lines, together with cell line molecular profiles.

**Figure 3**
Variable selection in high-dimensional data. (a) Three common relationships between variable matrices (X) and outcomes (Y). (b) The unified framework of linear models y ~ *g(Xβ)* for n samples and p variables (for p > n), variable matrix X = n × p, and coefficient vector β = p. The number of samples n may range from 10 to 1,000 in most studies, representing the number of profiled patients. The number of variables p is about 20,000 in most studies, representing the number of human genes. (c) High-dimensional regression through regularization. The coefficients of most high-dimensional regressions can be solved under a unified framework of minimizing the objective function f (β) together with a combination of L1 (LASSO) and L2 (ridge) penalties (where λ₁, λ₂ ≥ 0). The objective function of linear regression is the sum of least squares across all samples. The objective functions of logistic and Cox-PH regressions are the negative log of the likelihood function L(β, y, X). (d) High-dimensional regression through stepwise forward selection. At each step, the best variable is selected from a candidate pool to minimize the model error, such as cross-validation error. The procedure will terminate if any further variable selection increases the model error. Some previously selected variables may become insignificant during the stepwise process and get removed from the model. Abbreviations: Cox-PH, Cox proportional hazard; LASSO, least absolute shrinkage and selection operator.

**Figure 4**
Biomarker training using clinical and cell line data. (a) The training of a multigene biomarker to guide treatment decisions starts from a collection of tumor genomics profiles paired with the patients’ clinical outcomes. The association between gene profiles and patients’ clinical outcomes is tested by statistical models, and a subset of genes are selected through a cross-validation procedure to optimize prediction accuracy. The accuracy of the gene biomarker will be evaluated in clinical trials for Food and Drug Administration approval or commercialization. (b) Computational methods can identify response biomarkers from compound screen data. Statistical methods can identify genes whose molecular status is significantly associated with drug efficacy across screened cell lines. The identified biomarker could be a subset of genes or a genome-wide vector of scores with one value per gene. In the latter case, the therapy response of each patient could be predicted by correlating between tumor gene expression values and biomarker scores.

See this image and copyright information in PMC

Cited by

Big data in basic and translational cancer research.
Jiang P, Sinha S, Aldape K, Hannenhalli S, Sahinalp C, Ruppin E. Jiang P, et al. Nat Rev Cancer. 2022 Nov;22(11):625-639. doi: 10.1038/s41568-022-00502-0. Epub 2022 Sep 5. Nat Rev Cancer. 2022. PMID: 36064595 Free PMC article. Review.
Integrative Bioinformatics Analysis: Unraveling Variant Signatures and Single-Nucleotide Polymorphism Markers Associated with 5-FU-Based Chemotherapy Resistance in Colorectal Cancer Patients.
Askari M, Mirzaei E, Navapour L, Karimpour M, Rejali L, Sarirchi S, Nazemalhosseini-Mojarad E, Nobili S, Cava C, Sadeghi A, Fatemi N. Askari M, et al. J Gastrointest Cancer. 2024 Dec;55(4):1607-1619. doi: 10.1007/s12029-024-01102-x. Epub 2024 Sep 6. J Gastrointest Cancer. 2024. PMID: 39240276
Systematic investigation of cytokine signaling activity at the tissue and single-cell levels.
Jiang P, Zhang Y, Ru B, Yang Y, Vu T, Paul R, Mirza A, Altan-Bonnet G, Liu L, Ruppin E, Wakefield L, Wucherpfennig KW. Jiang P, et al. Nat Methods. 2021 Oct;18(10):1181-1191. doi: 10.1038/s41592-021-01274-5. Epub 2021 Sep 30. Nat Methods. 2021. PMID: 34594031 Free PMC article.
Systematic prediction of drug resistance caused by transporter genes in cancer cells.
Shen Y, Yan Z. Shen Y, et al. Sci Rep. 2021 Apr 1;11(1):7400. doi: 10.1038/s41598-021-86921-9. Sci Rep. 2021. PMID: 33795761 Free PMC article.
Spike-in normalization for single-cell RNA-seq reveals dynamic global transcriptional activity mediating anticancer drug response.
Wang X, Frederick J, Wang H, Hui S, Backman V, Ji Z. Wang X, et al. NAR Genom Bioinform. 2021 Jun 17;3(2):lqab054. doi: 10.1093/nargab/lqab054. eCollection 2021 Jun. NAR Genom Bioinform. 2021. PMID: 34159316 Free PMC article.

See all "Cited by" articles

References

1. Huang ME, Ye YC, Chen SR, Chai JR, Lu JX, et al. 1988. Use of all-trans retinoic acid in the treatment of acute promyelocytic leukemia. Blood 72:567–72 - PubMed
1. Deininger M, Buchdunger E, Druker BJ. 2005. The development of imatinib as a therapeutic agent for chronic myeloid leukemia. Blood 105:2640–53 - PubMed
1. Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, et al. 2004. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science 304:1497–500 - PubMed
1. Solomon BJ, Mok T, Kim DW, Wu YL, Nakagawa K, et al.2014. First-line crizotinib versus chemotherapy in ALK-positive lung cancer. New Engl. J. Med. 371:2167–77 - PubMed
1. Holohan C, Van Schaeybroeck S, Longley DB,Johnston PG. 2013. Cancer drug resistance: an evolving paradigm. Nat. Rev. Cancer 13:714–26 - PubMed

Grants and funding

U24 CA224316/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Big Data Approaches for Modeling Response and Resistance to Cancer Drugs

Affiliations

Big Data Approaches for Modeling Response and Resistance to Cancer Drugs

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources