A machine learning and directed network optimization approach to uncover TP53 regulatory patterns

Affiliations

¹ Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, UK.
² Department of Epidemiology & Biostatistics, School of Public Health, Imperial College London, London, UK.
³ Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK.
⁴ Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
⁵ MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
⁶ Department of Computing Sciences, BIDSA, Bocconi University, Milan, Italy.

PMID: 38047081
PMCID: PMC10692668
DOI: 10.1016/j.isci.2023.108291

A machine learning and directed network optimization approach to uncover TP53 regulatory patterns

Charalampos P Triantafyllidis et al. iScience. 2023.

. 2023 Oct 26;26(12):108291.

doi: 10.1016/j.isci.2023.108291. eCollection 2023 Dec 15.

Authors

Affiliations

¹ Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, UK.
² Department of Epidemiology & Biostatistics, School of Public Health, Imperial College London, London, UK.
³ Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK.
⁴ Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
⁵ MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
⁶ Department of Computing Sciences, BIDSA, Bocconi University, Milan, Italy.

PMID: 38047081
PMCID: PMC10692668
DOI: 10.1016/j.isci.2023.108291

Abstract

TP53, the Guardian of the Genome, is the most frequently mutated gene in human cancers and the functional characterization of its regulation is fundamental. To address this we employ two strategies: machine learning to predict the mutation status of TP53 from transcriptomic data, and directed regulatory networks to reconstruct the effect of mutations on the transcipt levels of TP53 targets. Using data from established databases (Cancer Cell Line Encyclopedia, The Cancer Genome Atlas), machine learning could predict the mutation status, but not resolve different mutations. On the contrary, directed network optimization allowed to infer the TP53 regulatory profile across: (1) mutations, (2) irradiation in lung cancer, and (3) hypoxia in breast cancer, and we could observe differential regulatory profiles dictated by (1) mutation type, (2) deleterious consequences of the mutation, (3) known hotspots, (4) protein changes, (5) stress condition (irradiation/hypoxia). This is an important first step toward using regulatory networks for the characterization of the functional consequences of mutations, and could be extended to other perturbations, with implications for drug design and precision medicine.

Keywords: Regulatory networks; TP53; cancer systems biology; causal inference; directed networks; machine learning; mutations; regulon; trascriptomics.

PubMed Disclaimer

Conflict of interest statement

J.S.R. reports funding from GSK, Pfizer, and Sanofi and fees/honoraria from Travere Therapeutics, Stadapharm, Astex, Pfizer, and Grunenthal.

Figures

**Figure 1**
Visual summary of the directed gene network approach First, expression and mutation profiles for the transcription factor (in this case *TP53*) are collected via established databases for cell-lines (CCLE) and tumor samples (TCGA). The regulon, as a set of target genes, is then extracted from DoRothEa, emanating from different sources of databases from experiments in cancer, with different levels of confidence (A–E). In addition, the prior knowledge network (PKN) as a collection of interactions is extracted from OmniPath. These three components are then used as an input in the CARNIVAL pipeline, where an optimization model reconstructs the PKN based on the perturbation and the given expression profile. In this way, we optimize one network per mutation across each sample, and are able to compare them for topological features based on the annotation each time.

**Figure 2**
Performance of models predicting *TP53* mutation in cell lines (CCLE) using RNAseq The models were built using a minimal four-gene signature and a comprehensive regulon from DoRothEA as described in the text. (A–F) Penalized regression was used with multiple settings. An Elastic Net model (see STAR methods) is built in cross-validation for different train-test combinations and misclassification error assessed. The models were trained to predict three different p53 mutational features: (A and B) Missense vs. any mutation, (C and D), WT (wild-type) vs. MT (mutated) and (E and F) hotspot p53 mutations. In each plot, the x-axis represents the different training set sizes while the y-axis shows the accuracy measure (i.e., the misclassification error) used to assess the performance of the fitted models. The mean error and the associated confidence interval are also reported for each training set size. Each green dot in the plots corresponds to a trained model. The red dot represents the best model selected in cross-validation (see STAR methods). Different training set sizes are used, and the one providing prediction error with the lowest upper confidence interval was chosen. The best model is then selected so as to have the minimum misclassification error.

**Figure 3**
Performance of models predicting *TP53* mutation in cancer samples (TCGA) using RNAseq The models were built using a four-gene signature and the regulon of *TP53* from DoRothEA as described in the text and the methods (See STAR methods): (1) (A and B) Missense versus all other types of mutations (no WT samples included), (2) (C and D) WT versus any mutation, and (3) (E and F) hotspot p53 mutations versus all other non-hotspot mutations (no WT samples included). In each plot, the x axis represents the different training set sizes while the y axis shows the accuracy measure (i.e., the misclassification error) used to assess the performance of the fitted models. The mean error and the associated confidence interval are also reported for each training set size. Each green dot in the plots corresponds to a trained model. The red dot represents the best model selected in cross-validation (see STAR methods). Different training set sizes are used, and the one providing prediction error with the lowest upper confidence interval was chosen. The best model is then selected so as to have the minimum misclassification error.

**Figure 4**
Illustration of our network comparison technique using two reconstructed networks in two distinct samples from breast cancer (A) An optimized based on expression and mutation information breast cancer cell line sample carrying a missense p53 mutation (protein change: p.E224K, cell line: CAL148, deleterious: False) and (B) equivalently a missense p53 mutation sample from a breast cancer cell line (protein change: p.E285K, cell line: BT474, deleterious: False). On top we see the perturbation node, our transcription factor *TP53* and its downstream DoRothEA target genes. We seek to understand what topological differences (both activation/inactivation and mode of regulation) exist between these two networks to calculate a percentage of similarity based on the edge intersection of these two networks, treating them as graphs. For instance, on the left network (A) and denoted with a framed rectangle, an activating arrow from *STAT1* to *FOS* exists whereas this edge is missing completely from network (B). These kinds of differences are taken into account to compute the similarity score (see STAR methods). The fraction of common edges found in both networks over the maximum number of edges (in the largest network of the two) gives the percentage of similarity. These common edges include the same starting node, end node and mode of regulation.

**Figure 5**
Similarity of the directed networks reconstructed for different *TP53* mutations in CCLE and TCGA samples (A and B) The different cancer types (x axis) in CCLE (A) (for a cut-off of at least 50% similarity) and TCGA (B) and the percentage of networks (y axis) that are similar across this cut-off, across three different settings: (1) all networks compared, (2) same mutation type, and (3) same deleterious function of mutation for *TP53*. It is evident that the similarity of the networks improves drastically across last two settings as opposed to the general first setting that does not take into account any feature when comparing the networks. These two plots together summarize the conclusion that when taking into account p53 mutation type or deleterious function of mutation, the regulatory profile of the transcription factor *TP53* is significantly more similar than by grouping randomly, in cell lines and tumor samples. Of note, in (B), sub-types $S T A D_E B V$ , $S A R C_D D L P S$ and $R E A D_G S$ have no same *TP53* mutation type pair identified in the data, thus the percentage is 0%. Additionally, $S A R C_D D L P S$ and $R E A D_G S$ also do not contain a pair of same deleterious *TP53* function. Finally, for $C E S C_A d e n o C a r c i n o m a, C O A D_P O L E, D L B C, G B M, R E A D_M S I, R E A D_P O L E$ , at least 50% of the compared network pairs had 100% similarity (identical graphs). The full data is shown in Tables S1 and S2 and radar plots Figures S13–S20.

**Figure 6**
Common genes across signatures extracted from the directed networks reconstructed for the different *TP53* mutations (A and B) *TP53* mutational meta-signatures (across all cancer types) for TCGA 5(A) and CCLE 5(B) derived using Louvain community detection (see text). Plots are done using R package and stratified by mutation. The signatures fluctuate in the number of genes involved approximately from 40 to 60 genes per mutation, in both cell lines and tumor samples. We can see the similarity in the number of genes shared across signatures in both CCLE and TCGA in the first column (all signatures), having 17 common genes in CCLE and 40 common in TCGA. Notably, missense mutations (the most prevalent across cancers) share seven genes in CCLE with non-deleterious signature and three genes with non-deleterious signature in TCGA, highlighting the specificity of the missense signature pan-cancer.

**Figure 7**
Changes in the regulatory network signature of *TP53* under different mutational backgrounds (A and B) Heatmaps showing predicted changes in CCLE (A) and TCGA (B). Empty cells indicate a predicted loss of interaction between the gene and p53.

See this image and copyright information in PMC

References

1. Edelman L.B., Fraser P. Transcription Factories: Genetic Programming in Three Dimensions. Curr. Opin. Genet. Dev. 2012;22:110–114. doi: 10.1016/j.gde.2012.01.010. - DOI - PubMed
1. Futreal P.A., Coin L., Marshall M., Down T., Hubbard T., Wooster R., Rahman N., Stratton M.R. A census of human cancer genes. Nat. Rev. Cancer. 2004;4:177–183. doi: 10.1038/nrc1299. - DOI - PMC - PubMed
1. Seçilmiş D., Hillerton T., Morgan D., Tjärnberg A., Nelander S., Nordling T.E.M., Sonnhammer E.L.L. Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data. NPJ Syst. Biol. Appl. 2020;6:37. doi: 10.1038/s41540-020-00154-6. - DOI - PMC - PubMed
1. Reyna M.A., Haan D., Paczkowska M., Verbeke L.P.C., Vazquez M., Kahraman A., Pulido-Tamayo S., Barenboim J., Wadi L., Dhingra P., et al. Pathway and network analysis of more than 2500 whole cancer genomes. Nat. Commun. 2020;11:729. doi: 10.1038/s41467-020-14367-0. - DOI - PMC - PubMed
1. Yan W., Xue W., Chen J., Hu G. Biological networks for cancer candidate biomarkers discovery. Cancer Inf. 2016;15:1–7. https://pubmed.ncbi.nlm.nih.gov/27625573 - PMC - PubMed

Grants and funding

23969/CRUK_/Cancer Research UK/United Kingdom

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A machine learning and directed network optimization approach to uncover TP53 regulatory patterns

Affiliations

A machine learning and directed network optimization approach to uncover TP53 regulatory patterns

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous