. 2022 Sep 15;38(18):4360-4368.

doi: 10.1093/bioinformatics/btac523.

Overcoming selection bias in synthetic lethality prediction

Colm Seale^{1

2}, Yasin Tepeli¹, Joana P Gonçalves¹

Affiliations

¹ Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft 2628 XE, The Netherlands.
² Holland Proton Therapy Center (HollandPTC), Delft 2600 AC, The Netherlands.

PMID: 35876858
PMCID: PMC9477536
DOI: 10.1093/bioinformatics/btac523

Overcoming selection bias in synthetic lethality prediction

Colm Seale et al. Bioinformatics. 2022.

. 2022 Sep 15;38(18):4360-4368.

doi: 10.1093/bioinformatics/btac523.

Authors

Colm Seale^{1

2}, Yasin Tepeli¹, Joana P Gonçalves¹

Affiliations

¹ Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft 2628 XE, The Netherlands.
² Holland Proton Therapy Center (HollandPTC), Delft 2600 AC, The Netherlands.

PMID: 35876858
PMCID: PMC9477536
DOI: 10.1093/bioinformatics/btac523

Abstract

Motivation: Synthetic lethality (SL) between two genes occurs when simultaneous loss of function leads to cell death. This holds great promise for developing anti-cancer therapeutics that target synthetic lethal pairs of endogenously disrupted genes. Identifying novel SL relationships through exhaustive experimental screens is challenging, due to the vast number of candidate pairs. Computational SL prediction is therefore sought to identify promising SL gene pairs for further experimentation. However, current SL prediction methods lack consideration for generalizability in the presence of selection bias in SL data.

Results: We show that SL data exhibit considerable gene selection bias. Our experiments designed to assess the robustness of SL prediction reveal that models driven by the topology of known SL interactions (e.g. graph, matrix factorization) are especially sensitive to selection bias. We introduce selection bias-resilient synthetic lethality (SBSL) prediction using regularized logistic regression or random forests. Each gene pair is described by 27 molecular features derived from cancer cell line, cancer patient tissue and healthy donor tissue samples. SBSL models are built and tested using approximately 8000 experimentally derived SL pairs across breast, colon, lung and ovarian cancers. Compared to other SL prediction methods, SBSL showed higher predictive performance, better generalizability and robustness to selection bias. Gene dependency, quantifying the essentiality of a gene for cell survival, contributed most to SBSL predictions. Random forests were superior to linear models in the absence of dependency features, highlighting the relevance of mutual exclusivity of somatic mutations, co-expression in healthy tissue and differential expression in tumour samples.

Availability and implementation: https://github.com/joanagoncalveslab/sbsl.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Structure of SL labels. Adjacency plot showing OV gene pairs. Elements along horizontal and vertical axes represent unique genes. Each coloured cell denotes a negative (red) or positive (blue) SL pair. White cells denote pairs with no label. Rows are ordered according to hierarchical clustering with complete linkage and Euclidean distance. Columns follow the ordering of rows. The barplot to the right shows the number of pairs each gene is involved in. The group of eight genes at the bottom of the plot (highlighted in red) consists mostly of tyrosine kinases (A color version of this figure appears in the online version of this article.)

**Fig. 2.**
Cross-SL gold standard performances. AUROC values averaged over 10 runs for: (left) BRCA models trained on ISLE and tested on DiscoverSL; (right) LUAD models were trained on DiscoverSL and tested on ISLE

**Fig. 3.**
Performances of gene holdout experiments, where bias is controlled by ensuring that none, one or both genes of pairs in the test set are excluded from the train set. Shown are AUROC values for each gene-holdout experiment per cancer type (10 runs). For ‘None’, we only guarantee that train and test sets are disjoint in terms of gene pairs, not individual genes; for ‘Single’, only one gene from a gene pair in the test set can be present in the train set; for ‘Double’ neither gene of a pair in the test set appears in the train set. The results for ‘None’ correspond to those also reported in Table 2. *Note*: There was insufficient data to conduct the OV ‘Double’ experiment

**Fig. 4.**
Cross-cancer and LOCO performances. Average AUROC for L0L2 and MUVR models over 10 runs. *Cross-cancer:* Vertical and horizontal axes denote the cancer types used to train and test, respectively. *LOCO:* Horizontal axis denotes the cancer type held out for testing. Models trained on balanced data from all other cancers

**Fig. 5.**
Performance of SBSL models with and without gene dependency-based features (AUROC over 10 runs), respectively, labelled ‘Full Feature Set’ and ‘No Dep Features’

See this image and copyright information in PMC

References

1. Ashburner M. et al. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. - PMC - PubMed
1. Babur Ö. et al. (2015) Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations. Genome Biol., 16, 45. - PMC - PubMed
1. Bangdiwala S.I. (1989) The wald statistic in proportional hazards hypothesis testing. Biom. J., 31, 203–211.
1. Behan F.M. et al. (2019) Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature, 568, 511–516. - PubMed
1. Benstead-Hume G. et al. (2019) Predicting synthetic lethal interactions using conserved patterns in protein interaction networks. PLoS Comput. Biol., 15, e1006888. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

U54 EY032442/EY/NEI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Overcoming selection bias in synthetic lethality prediction

Affiliations

Overcoming selection bias in synthetic lethality prediction

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical