Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar;24(3):100905.
doi: 10.1016/j.mcpro.2025.100905. Epub 2025 Jan 9.

Causal Inference and Annotation of Phosphoproteomics Data in Multiomics Cancer Studies

Affiliations

Causal Inference and Annotation of Phosphoproteomics Data in Multiomics Cancer Studies

Qun Dong et al. Mol Cell Proteomics. 2025 Mar.

Abstract

Protein phosphorylation plays a crucial role in regulating diverse biological processes. Perturbations in protein phosphorylation are closely associated with downstream pathway dysfunctions, whereas alterations in protein expression could serve as sensitive indicators of pathological status. However, there are currently few methods that can accurately identify the regulatory links between protein phosphorylation and expression, given issues like reverse causation and confounders. Here, we present Phoslink, a causal inference model to infer causal effects between protein phosphorylation and expression, integrating prior evidence and multiomics data. We demonstrated the feasibility and advantages of our method under various simulation scenarios. Phoslink exhibited more robust estimates and lower false discovery rate than commonly used Pearson and Spearman correlations, with better performance than canonical instrumental variable selection methods for Mendelian randomization. Applying this approach, we identified 345 causal links involving 109 phosphosites and 310 proteins in 79 lung adenocarcinoma (LUAD) samples. Based on these links, we constructed a causal regulatory network and identified 26 key regulatory phosphosites as regulators strongly associated with LUAD. Notably, 16 of these regulators were exclusively identified through phosphosite-protein causal regulatory relationships, highlighting the significance of causal inference. We explored potentially druggable phosphoproteins and provided critical clues for drug repurposing in LUAD. We also identified significant mediation between protein phosphorylation and LUAD through protein expression. In summary, our study introduces a new approach for causal inference in phosphoproteomics studies. Phoslink demonstrates its utility in potential drug target identification, thereby accelerating the clinical translation of cancer proteomics and phosphoproteomic data.

Keywords: cancer proteomics; causal inference; multiomics; network; phosphoproteomics.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare no competing interests.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
The analytic framework of the study. To characterize the functionality of identified phosphosites in multiomics cancer research, we introduce a causal inference model termed “Phoslink,” which integrates prior evidence and a multiomics cancer dataset to infer causal regulatory links between protein phosphorylation and expression. Applied to the CNHPP-LUAD dataset, Phoslink uncovered regulatory links and identified key phosphosites, including MAP4 pS941, located within a therapeutic target domain in lung carcinoma. A detailed mediation analysis further assesses the impact of phospho-based regulation on survival, distinguishing between effects mediated by proteins and those that are not. CNHPP, Chinese Human Proteome Project; LUAD, lung adenocarcinoma.
Fig. 2
Fig. 2
Performance evaluation of Phoslink. FDR (true causal effect θ = 0) and Power (true causal effect θ = 0.6) for Phoslink and other methods in simulations at different sample sizes. The x-axis shows the sample size, whereas the y-axis displays the FDR (purple) or power (blue) for each method. FDR, false discovery rate.
Fig. 3
Fig. 3
Comparison of accuracy and consistency between Phoslink and correlation analyses. Density plots of effect estimates from Phoslink, Pearson, and Spearman analyses with 0% invalid IV scenario. The x-axis represents the estimates from the three methods. IV, instrumental variable.
Fig. 4
Fig. 4
Overview of detected germline SNPs in LUAD.A, distribution of germline SNPs on autosomal chromosomes within a 1 Mb window size. The light color represents a low content, and the dark color represents a high content of germline SNPs. B, profiling of germline SNPs in the well-defined driver genes of LUAD. Rows correspond to germline SNPs in LUAD driver genes, and columns represent the 79 samples, showing the mutation status: 0 (homozygous wildtype), 1 (heterozygous genotype), and 2 (homozygous mutant). C, Manhattan plot for all phosphosites reveals p values from univariable linear regression adjusted for age, sex, and smoking status and germline SNP positions across 22 autosomal chromosomes. The horizontal lines indicate the genome-wide cutoff of 5 × 10−8 (gray) and 0.05 (red), respectively. The y-axis shows the -log10 of the p values for the associations of genetic variants with phosphorylation levels. LUAD, lung adenocarcinoma.
Fig. 5
Fig. 5
Phosphoregulatory network in LUAD. The network indicates the regulatory relationship between phosphoregulators and proteins. Node colors correspond to the most closely related cancer hallmarks, determined by the similarity of the protein sets regulated by phosphoregulators (limited to those regulating six or more proteins), and node size is proportional to node degree. LUAD, lung adenocarcinoma.
Fig. 6
Fig. 6
Exploring the significance of regulators in LUAD.A, distribution of the regulators associated with different clinical features in LUAD (FDR <0.05). B, stacked histogram showing the distribution of proteins associated with different clinical features regulated by 26 key regulators. The red-labeled key regulators are significantly associated with clinical features in two independent LUAD datasets.C, the dot plot describes the KEGG pathway enrichment for all key regulators, with dot size scaled by the GeneRatio, and color denoting the significance of association. D, Kaplan–Meier curves of overall survival, categorized by low versus high AKAP12 at S696 and S598 phosphorylation levels. FDR, false discovery rate; KEGG, Kyoto Encyclopedia of Genes and Genomes; LUAD, lung adenocarcinoma.
Fig. 7
Fig. 7
Evaluating the potential of regulators for drug development.A, the 3D structure of MAP4 with pS941 marked with red using UCSF’s Chimera X visualization tool, where blue indicates the tubulin-binding domain. B, network diagrams depict the upstream kinases of phosphoregulators, and the outermost layer presents the group information of kinases. C, protein expression of RANGAP1 in tumors versus NATs, with the Wilcoxon signed-rank test p value indicated on top. D, Kaplan–Meier plots illustrating overall and disease-free survival in samples from the LUAD cohort, categorized by low versus high RANGAP1 protein expression. High expression of RANGAP1 is significantly associated with worse outcomes. E, mediation analysis quantified the effect sizes of the RANGAP1 mediator model, with GORASP2 pS451 as the exposure and LUAD survival as the outcome. LUAD, lung adenocarcinoma; NAT, noncancerous adjacent tissue.

Similar articles

References

    1. Ardito F., Giuliani M., Perrone D., Troiano G., Lo Muzio L. The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy (Review) Int. J. Mol. Med. 2017;40:271–280. - PMC - PubMed
    1. Singh V., Ram M., Kumar R., Prasad R., Roy B.K., Singh K.K. Phosphorylation: implications in cancer. Protein J. 2017;36:1–6. - PubMed
    1. Vasaikar S., Huang C., Wang X., Petyuk V.A., Savage S.R., Wen B., et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell. 2019;177:1035–1049.e1019. - PMC - PubMed
    1. Floyd B.M., Drew K., Marcotte E.M. Systematic identification of protein phosphorylation-mediated interactions. J. Proteome Res. 2021;20:1359–1370. - PMC - PubMed
    1. Muller R., Meacham Z.A., Ferguson L., Ingolia N.T. CiBER-seq dissects genetic networks by quantitative CRISPRi profiling of expression phenotypes. Science. 2020;370 - PMC - PubMed

Substances

LinkOut - more resources