. 2016 Nov 23:6:36812.

doi: 10.1038/srep36812.

Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy

Theo A Knijnenburg¹, Gunnar W Klau², Francesco Iorio³, Mathew J Garnett⁴, Ultan McDermott⁴, Ilya Shmulevich¹, Lodewyk F A Wessels⁵

Affiliations

¹ Institute for Systems Biology, Seattle, US.
² Centrum Wiskunde &Informatica, Amsterdam, The Netherlands.
³ European Molecular Biology Laboratory - European Bioinformatics Institute, UK.
⁴ Wellcome Trust Sanger Institute, UK.
⁵ Netherlands Cancer Institute, Amsterdam, and The Faculty of EEMCS, Delft University of Technology, Delft, The Netherlands.

PMID: 27876821
PMCID: PMC5120272
DOI: 10.1038/srep36812

Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy

Theo A Knijnenburg et al. Sci Rep. 2016.

. 2016 Nov 23:6:36812.

doi: 10.1038/srep36812.

Authors

Theo A Knijnenburg¹, Gunnar W Klau², Francesco Iorio³, Mathew J Garnett⁴, Ultan McDermott⁴, Ilya Shmulevich¹, Lodewyk F A Wessels⁵

Affiliations

¹ Institute for Systems Biology, Seattle, US.
² Centrum Wiskunde &Informatica, Amsterdam, The Netherlands.
³ European Molecular Biology Laboratory - European Bioinformatics Institute, UK.
⁴ Wellcome Trust Sanger Institute, UK.
⁵ Netherlands Cancer Institute, Amsterdam, and The Faculty of EEMCS, Delft University of Technology, Delft, The Netherlands.

PMID: 27876821
PMCID: PMC5120272
DOI: 10.1038/srep36812

Abstract

Mining large datasets using machine learning approaches often leads to models that are hard to interpret and not amenable to the generation of hypotheses that can be experimentally tested. We present 'Logic Optimization for Binary Input to Continuous Output' (LOBICO), a computational approach that infers small and easily interpretable logic models of binary input features that explain a continuous output variable. Applying LOBICO to a large cancer cell line panel, we find that logic combinations of multiple mutations are more predictive of drug response than single gene predictors. Importantly, we show that the use of the continuous information leads to robust and more accurate logic models. LOBICO implements the ability to uncover logic models around predefined operating points in terms of sensitivity and specificity. As such, it represents an important step towards practical application of interpretable logic models.

PubMed Disclaimer

Figures

**Figure 1. Workflow of LOBICO.**
LOBICO has two main inputs: (1) a binary matrix of samples by features (depicted in the blue box). Here, the binary matrix contains the mutation status of 60 cancer genes measured across 642 cancer cell lines. (2) a continuous vector with a value for each of the samples (depicted in the orange boxes). In this case, the vector contains the IC50 of each cell line in response to Afatinib, an EGFR/ERBB2 inhibitor. The continuous vector is transformed into a binary vector and a sample-specific weight vector using a binarization scheme. Particularly, the IC50s are binarized using a threshold leading to a set of sensitive and a set of resistant cell lines. The distances of the original IC50s to the binarization threshold are represented in the weight vector, which is normalized per class. Then, LOBICO finds the optimal logic model of features (gene mutations) that minimizes the total weight of misclassified samples (cell lines). In this case, the optimal 2-input OR logic formula is ‘EGFR OR ERBB2’ (depicted in the white box).

**Figure 2. Multi-predictor models outperform single predictor models.**
Scatter plot with the 10-fold cross-validation (CV) error for single predictor models (x-axis) and the best (lowest CV error) multi-predictor model (y-axis). Each point represents one of the 142 drugs. Statistically significant models are highlighted in blue. Multi-predictor models that have a CV error lower than 0.35 and at least a 25% improvement upon the single predictor model are highlighted in magenta. The two examples discussed in the text are highlighted in bold typeface.

**Figure 3. LOBICO’s use of continuous output leads to robust and accurate models.**
(a) Heatmaps depicting the feature importance (FI) scores across the 60 gene mutations for the logic models inferred to explain the drug response to the PI3K/mTOR inhibitor BEZ235. The upper heatmap represents FI scores for the 2-input OR model (K = 2, M = 1) using three different binarization thresholds for logic models with binarized output, i.e. not using the sample-specific weights. The middle of the three heatmaps represents the same FI scores, but for logic models with continuous output, i.e. using the sample-specific weights. The bottom two heatmaps depict FI scores aggregated across all model complexities, using the standard binarization threshold (t = 0.05), for both the logic models with and without the sample-specific weights. The labels of the gene mutations with a large FI in any of these heatmaps are printed below. The ‘ground truth’ features, i.e. the expected or annotated targets of this drug, PTEN and PIK3CA, are printed in bold. (b) Scatter plot with the average Pearson correlation coefficients of the similarity of FI scores across the binarization thresholds for inferred logic models without (x-axis) and with (y-axis) the sample-specific weights. Each point represents one of the 142 drugs. The correlation scores are computed using the model-complexity-specific FI scores. The grey bars on top and to the right of the scatter plot represent histograms of these correlation scores for models without and with the sample-specific weights, respectively. (c) Scatter plot with the importance of the ground truth features for inferred logic models without (x-axis) and with (y-axis) the sample-specific weights. Each point represents one of the 49 drugs, for which ground truth features were available. The importance scores of the ground truth features were derived from aggregated FI scores.

**Figure 4. LOBICO finds solutions at different operating points.**
(a) ROC space with LOBICO solutions to explain drug sensitivity to the MEK1/2 inhibitor AZD6244. Blue crosses indicate the TPR and FPR at which the solution was found. The logic formula of the solutions is printed next to the blue crosses. The color of the genes in a formula indicate their FI. Colors range from black (moderately important) to bright red (highly important). For comparison, the best single predictor solutions are visualized in green. Pink arrows point to solutions discussed in the text. The inlay depicts the histogram of IC50s for AZD6244 together with the binarization threshold, which divides the cell lines into 91 cell lines that are sensitive to AZD6244 and 515 that are resistant. (b) Average FI scores for a group of 6 MEK/RAF inhibitors (including AZD6244), for high specificity solutions (orange) and high sensitivity solutions (magenta). High specificity solutions were defined as solutions with FPR < 10%. Conversely, high sensitivity solutions were defined as solutions with TPR > 90%. The FI scores of all solutions on the Pareto front (ROC curve) that met these respective criteria across the six drugs were averaged. We distinguished between positive terms, indicating mutations (Mut.) and negated terms, indicating wild-type (WT). The two genes with the highest average FI score as mutants were printed at the top of their FI bar. The two genes with the highest average FI score as wild-types were printed at the bottom of their FI bar. (c,d) Similar to (b), but for a group of two PI3K inhibitors and a group of two AURKA/B inhibitors, respectively.

**Figure 5. 3-layer Boolean circuit representing the structure of the LOBICO ILP formulation.**
In Layer 1 variables s₁₁, …, s_PK are used to select the inputs (x₁, x₂, …, x_P) that are combined using a conjunction (AND gate) to create the K disjunctive terms in Layer 2. These disjunctive terms (the outputs of the AND gates) are represented by variables t₁, …, t_K. In Layer 3 the disjunctive terms are combined using a disjunction (OR gate) resulting in the inferred binary output variable y′. This figure is adapted from Figure 2.1 in Kamath *et al*..

See this image and copyright information in PMC

Cited by

Deep learning methods for drug response prediction in cancer: Predominant and emerging trends.
Partin A, Brettin TS, Zhu Y, Narykov O, Clyde A, Overbeek J, Stevens RL. Partin A, et al. Front Med (Lausanne). 2023 Feb 15;10:1086097. doi: 10.3389/fmed.2023.1086097. eCollection 2023. Front Med (Lausanne). 2023. PMID: 36873878 Free PMC article. Review.
SuperDendrix algorithm integrates genetic dependencies and genomic alterations across pathways and cancer types.
Park TY, Leiserson MDM, Klau GW, Raphael BJ. Park TY, et al. Cell Genom. 2022 Feb 9;2(2):100099. doi: 10.1016/j.xgen.2022.100099. Cell Genom. 2022. PMID: 35382456 Free PMC article.
Computational frameworks transform antagonism to synergy in optimizing combination therapies.
Chen J, Lin A, Jiang A, Qi C, Liu Z, Cheng Q, Yuan S, Luo P. Chen J, et al. NPJ Digit Med. 2025 Jan 19;8(1):44. doi: 10.1038/s41746-025-01435-2. NPJ Digit Med. 2025. PMID: 39828791 Free PMC article. Review.
Deep reinforcement learning for personalized treatment recommendation.
Liu M, Shen X, Pan W. Liu M, et al. Stat Med. 2022 Sep 10;41(20):4034-4056. doi: 10.1002/sim.9491. Epub 2022 Jun 18. Stat Med. 2022. PMID: 35716038 Free PMC article.
Personalized logical models to investigate cancer response to BRAF treatments in melanomas and colorectal cancers.
Béal J, Pantolini L, Noël V, Barillot E, Calzone L. Béal J, et al. PLoS Comput Biol. 2021 Jan 28;17(1):e1007900. doi: 10.1371/journal.pcbi.1007900. eCollection 2021 Jan. PLoS Comput Biol. 2021. PMID: 33507915 Free PMC article.

See all "Cited by" articles

References

1. Zou H. & Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 301–320 (2005).
1. Breiman L. Random forests. Machine learning 45, 5–32 (2001).
1. Ruczinski I., Kooperberg C. & LeBlanc M. Logic regression. Journal of Computational and Graphical Statistics 12, 475–511 (2003).
1. Kooperberg C. & Ruczinski I. Identifying interacting SNPs using Monte Carlo logic regression. Genetic epidemiology 28, 157–170 (2005). - PubMed
1. Mukherjee S. et al.. Sparse combinatorial inference with an application in cancer biology. Bioinformatics 25, 265–271 (2009). - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

U24 CA143835/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy

Affiliations

Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials