Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 26;26(12):108291.
doi: 10.1016/j.isci.2023.108291. eCollection 2023 Dec 15.

A machine learning and directed network optimization approach to uncover TP53 regulatory patterns

Affiliations

A machine learning and directed network optimization approach to uncover TP53 regulatory patterns

Charalampos P Triantafyllidis et al. iScience. .

Abstract

TP53, the Guardian of the Genome, is the most frequently mutated gene in human cancers and the functional characterization of its regulation is fundamental. To address this we employ two strategies: machine learning to predict the mutation status of TP53 from transcriptomic data, and directed regulatory networks to reconstruct the effect of mutations on the transcipt levels of TP53 targets. Using data from established databases (Cancer Cell Line Encyclopedia, The Cancer Genome Atlas), machine learning could predict the mutation status, but not resolve different mutations. On the contrary, directed network optimization allowed to infer the TP53 regulatory profile across: (1) mutations, (2) irradiation in lung cancer, and (3) hypoxia in breast cancer, and we could observe differential regulatory profiles dictated by (1) mutation type, (2) deleterious consequences of the mutation, (3) known hotspots, (4) protein changes, (5) stress condition (irradiation/hypoxia). This is an important first step toward using regulatory networks for the characterization of the functional consequences of mutations, and could be extended to other perturbations, with implications for drug design and precision medicine.

Keywords: Regulatory networks; TP53; cancer systems biology; causal inference; directed networks; machine learning; mutations; regulon; trascriptomics.

PubMed Disclaimer

Conflict of interest statement

J.S.R. reports funding from GSK, Pfizer, and Sanofi and fees/honoraria from Travere Therapeutics, Stadapharm, Astex, Pfizer, and Grunenthal.

Figures

None
Graphical abstract
Figure 1
Figure 1
Visual summary of the directed gene network approach First, expression and mutation profiles for the transcription factor (in this case TP53) are collected via established databases for cell-lines (CCLE) and tumor samples (TCGA). The regulon, as a set of target genes, is then extracted from DoRothEa, emanating from different sources of databases from experiments in cancer, with different levels of confidence (A–E). In addition, the prior knowledge network (PKN) as a collection of interactions is extracted from OmniPath. These three components are then used as an input in the CARNIVAL pipeline, where an optimization model reconstructs the PKN based on the perturbation and the given expression profile. In this way, we optimize one network per mutation across each sample, and are able to compare them for topological features based on the annotation each time.
Figure 2
Figure 2
Performance of models predicting TP53 mutation in cell lines (CCLE) using RNAseq The models were built using a minimal four-gene signature and a comprehensive regulon from DoRothEA as described in the text. (A–F) Penalized regression was used with multiple settings. An Elastic Net model (see STAR methods) is built in cross-validation for different train-test combinations and misclassification error assessed. The models were trained to predict three different p53 mutational features: (A and B) Missense vs. any mutation, (C and D), WT (wild-type) vs. MT (mutated) and (E and F) hotspot p53 mutations. In each plot, the x-axis represents the different training set sizes while the y-axis shows the accuracy measure (i.e., the misclassification error) used to assess the performance of the fitted models. The mean error and the associated confidence interval are also reported for each training set size. Each green dot in the plots corresponds to a trained model. The red dot represents the best model selected in cross-validation (see STAR methods). Different training set sizes are used, and the one providing prediction error with the lowest upper confidence interval was chosen. The best model is then selected so as to have the minimum misclassification error.
Figure 3
Figure 3
Performance of models predicting TP53 mutation in cancer samples (TCGA) using RNAseq The models were built using a four-gene signature and the regulon of TP53 from DoRothEA as described in the text and the methods (See STAR methods): (1) (A and B) Missense versus all other types of mutations (no WT samples included), (2) (C and D) WT versus any mutation, and (3) (E and F) hotspot p53 mutations versus all other non-hotspot mutations (no WT samples included). In each plot, the x axis represents the different training set sizes while the y axis shows the accuracy measure (i.e., the misclassification error) used to assess the performance of the fitted models. The mean error and the associated confidence interval are also reported for each training set size. Each green dot in the plots corresponds to a trained model. The red dot represents the best model selected in cross-validation (see STAR methods). Different training set sizes are used, and the one providing prediction error with the lowest upper confidence interval was chosen. The best model is then selected so as to have the minimum misclassification error.
Figure 4
Figure 4
Illustration of our network comparison technique using two reconstructed networks in two distinct samples from breast cancer (A) An optimized based on expression and mutation information breast cancer cell line sample carrying a missense p53 mutation (protein change: p.E224K, cell line: CAL148, deleterious: False) and (B) equivalently a missense p53 mutation sample from a breast cancer cell line (protein change: p.E285K, cell line: BT474, deleterious: False). On top we see the perturbation node, our transcription factor TP53 and its downstream DoRothEA target genes. We seek to understand what topological differences (both activation/inactivation and mode of regulation) exist between these two networks to calculate a percentage of similarity based on the edge intersection of these two networks, treating them as graphs. For instance, on the left network (A) and denoted with a framed rectangle, an activating arrow from STAT1 to FOS exists whereas this edge is missing completely from network (B). These kinds of differences are taken into account to compute the similarity score (see STAR methods). The fraction of common edges found in both networks over the maximum number of edges (in the largest network of the two) gives the percentage of similarity. These common edges include the same starting node, end node and mode of regulation.
Figure 5
Figure 5
Similarity of the directed networks reconstructed for different TP53 mutations in CCLE and TCGA samples (A and B) The different cancer types (x axis) in CCLE (A) (for a cut-off of at least 50% similarity) and TCGA (B) and the percentage of networks (y axis) that are similar across this cut-off, across three different settings: (1) all networks compared, (2) same mutation type, and (3) same deleterious function of mutation for TP53. It is evident that the similarity of the networks improves drastically across last two settings as opposed to the general first setting that does not take into account any feature when comparing the networks. These two plots together summarize the conclusion that when taking into account p53 mutation type or deleterious function of mutation, the regulatory profile of the transcription factor TP53 is significantly more similar than by grouping randomly, in cell lines and tumor samples. Of note, in (B), sub-types STAD_EBV, SARC_DDLPS and READ_GS have no same TP53 mutation type pair identified in the data, thus the percentage is 0%. Additionally, SARC_DDLPS and READ_GS also do not contain a pair of same deleterious TP53 function. Finally, for CESC_AdenoCarcinoma,COAD_POLE,DLBC,GBM,READ_MSI,READ_POLE, at least 50% of the compared network pairs had 100% similarity (identical graphs). The full data is shown in Tables S1 and S2 and radar plots Figures S13–S20.
Figure 6
Figure 6
Common genes across signatures extracted from the directed networks reconstructed for the different TP53 mutations (A and B) TP53 mutational meta-signatures (across all cancer types) for TCGA 5(A) and CCLE 5(B) derived using Louvain community detection (see text). Plots are done using R package and stratified by mutation. The signatures fluctuate in the number of genes involved approximately from 40 to 60 genes per mutation, in both cell lines and tumor samples. We can see the similarity in the number of genes shared across signatures in both CCLE and TCGA in the first column (all signatures), having 17 common genes in CCLE and 40 common in TCGA. Notably, missense mutations (the most prevalent across cancers) share seven genes in CCLE with non-deleterious signature and three genes with non-deleterious signature in TCGA, highlighting the specificity of the missense signature pan-cancer.
Figure 7
Figure 7
Changes in the regulatory network signature of TP53 under different mutational backgrounds (A and B) Heatmaps showing predicted changes in CCLE (A) and TCGA (B). Empty cells indicate a predicted loss of interaction between the gene and p53.

References

    1. Edelman L.B., Fraser P. Transcription Factories: Genetic Programming in Three Dimensions. Curr. Opin. Genet. Dev. 2012;22:110–114. doi: 10.1016/j.gde.2012.01.010. - DOI - PubMed
    1. Futreal P.A., Coin L., Marshall M., Down T., Hubbard T., Wooster R., Rahman N., Stratton M.R. A census of human cancer genes. Nat. Rev. Cancer. 2004;4:177–183. doi: 10.1038/nrc1299. - DOI - PMC - PubMed
    1. Seçilmiş D., Hillerton T., Morgan D., Tjärnberg A., Nelander S., Nordling T.E.M., Sonnhammer E.L.L. Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data. NPJ Syst. Biol. Appl. 2020;6:37. doi: 10.1038/s41540-020-00154-6. - DOI - PMC - PubMed
    1. Reyna M.A., Haan D., Paczkowska M., Verbeke L.P.C., Vazquez M., Kahraman A., Pulido-Tamayo S., Barenboim J., Wadi L., Dhingra P., et al. Pathway and network analysis of more than 2500 whole cancer genomes. Nat. Commun. 2020;11:729. doi: 10.1038/s41467-020-14367-0. - DOI - PMC - PubMed
    1. Yan W., Xue W., Chen J., Hu G. Biological networks for cancer candidate biomarkers discovery. Cancer Inf. 2016;15:1–7. https://pubmed.ncbi.nlm.nih.gov/27625573 - PMC - PubMed