. 2022 Nov 10;18(11):e1010702.

doi: 10.1371/journal.pcbi.1010702. eCollection 2022 Nov.

Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners

Himangi Srivastava^{1

2}, Michael J Lippincott¹, Jordan Currie¹, Robert Canfield¹, Maggie P Y Lam^{1

2

3}, Edward Lau^{1

2}

Affiliations

¹ Department of Medicine/Cardiology, University of Colorado School of Medicine, Aurora, Colorado, United States of America.
² Consortium for Fibrosis Research and Translation, University of Colorado School of Medicine, Aurora, Colorado, United States of America.
³ Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America.

PMID: 36356032
PMCID: PMC9681107
DOI: 10.1371/journal.pcbi.1010702

Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners

Himangi Srivastava et al. PLoS Comput Biol. 2022.

. 2022 Nov 10;18(11):e1010702.

doi: 10.1371/journal.pcbi.1010702. eCollection 2022 Nov.

Authors

Himangi Srivastava^{1

2}, Michael J Lippincott¹, Jordan Currie¹, Robert Canfield¹, Maggie P Y Lam^{1

2

3}, Edward Lau^{1

2}

Affiliations

¹ Department of Medicine/Cardiology, University of Colorado School of Medicine, Aurora, Colorado, United States of America.
² Consortium for Fibrosis Research and Translation, University of Colorado School of Medicine, Aurora, Colorado, United States of America.
³ Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America.

PMID: 36356032
PMCID: PMC9681107
DOI: 10.1371/journal.pcbi.1010702

Abstract

Protein and mRNA levels correlate only moderately. The availability of proteogenomics data sets with protein and transcript measurements from matching samples is providing new opportunities to assess the degree to which protein levels in a system can be predicted from mRNA information. Here we examined the contributions of input features in protein abundance prediction models. Using large proteogenomics data from 8 cancer types within the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data set, we trained models to predict the abundance of over 13,000 proteins using matching transcriptome data from up to 958 tumor or normal adjacent tissue samples each, and compared predictive performances across algorithms, data set sizes, and input features. Over one-third of proteins (4,648) showed relatively poor predictability (elastic net r ≤ 0.3) from their cognate transcripts. Moreover, we found widespread occurrences where the abundance of a protein is considerably less well explained by its own cognate transcript level than that of one or more trans locus transcripts. The incorporation of additional trans-locus transcript abundance data as input features increasingly improved the ability to predict sample protein abundance. Transcripts that contribute to non-cognate protein abundance primarily involve those encoding known or predicted interaction partners of the protein of interest, including not only large multi-protein complexes as previously shown, but also small stable complexes in the proteome with only one or few stable interacting partners. Network analysis further shows a complex proteome-wide interdependency of protein abundance on the transcript levels of multiple interacting partners. The predictive model analysis here therefore supports that protein-protein interaction including in small protein complexes exert post-transcriptional influence on proteome compositions more broadly than previously recognized. Moreover, the results suggest mRNA and protein co-expression analysis may have utility for finding gene interactions and predicting expression changes in biological systems.

Copyright: © 2022 Srivastava et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Genewise dispersion of protein predictability from transcriptome data.**
Box plots of test set correlation coefficients between the transcript-predicted and actual protein level for each protein are shown across five feature sets (column: single/self transcript, CORUM interactors, STRING 800 high-confidence associated proteins; STRING 200 low-confidence associated proteins, and all transcripts) and three algorithms (multiple linear regression, elastic net, and random forest). In each plot, the x axis denotes the number of additive CPTAC data sets used to train the models as described in Methods; box: interquartile range; whiskers: +/– 1.5 IQR; notch: SEM.

**Fig 2. Pathway enrichment of proteins with good and poor predictability.**
A. Tree plots showing the clustering and relationships of gene ontology terms that are significantly enriched among proteins whose abundances are well predicted by their own transcripts (r ≥ 0.6). B. Tree plots of terms enriched among proteins whose abundances are poorly predicted by their own transcripts (r ≤ 0.3).

**Fig 3. Proteins with improved predicted levels after inclusion of additional transcript features.**
Four proteins with substantial predictability from transcriptome data upon the inclusion of additional features are shown: A. PCCB, B. CMC1, C. PSMG2, D. SMCR8. For each protein, the transcript-trained prediction of protein level is plotted on the x axis and the actual protein level is plotted on the y axis. The lack of variance in predicted protein levels from the self-transcript model is due to the regularization of the elastic net model, and corresponds to a lack of correlation between PCCB mRNA and protein (see Fig 4). Blue: train set, brown: test set. Columns denote the transcript feature set used to train the model. The number of features used to train the model in each feature set is shown inside each plot. r: Correlation coefficient.

**Fig 4. mRNA-Protein correlations of PCCB and CMC1 with functionally associated proteins.**
Two examples of proteins whose abundance is better explained by another transcript are shown. A. PCCB protein level is predicted by PCCA transcript but not its own transcript. B. CMC1 protein level is explained by MT-CO1 transcript level but not its own transcript. Substantial correlations across transcripts and proteins (≥ 0.4) are bolded.

**Fig 5. mRNA-Protein correlations of PSMG2 and SMCR8 with functionally associated proteins.**
Two examples of proteins whose abundance is better explained by another transcript are shown. A. PSMG2 protein level is predicted by PSMG1 transcript but not its own transcript. B. SMCR8 protein level is explained by C9orf72 transcript level but not its own transcript. Substantial correlations across transcripts and proteins (≥ 0.4) are bolded.

**Fig 6. Directed graphs of protein and transcript interrelationships identify candidate regulatory genes.**
**A-C.** Examples of directed graphs constructed from genome-wide relationships of transcript-predicted proteins, containing members of A. the propionyl-CoA carboxylase complex; B. the cytochrome c oxidase, mitochondrial complex; C. the PI4K2A-WASH complex, the RICH1/AMOT polarity complex, and others. In each subgraph, orange nodes have outflow edges only (i.e., they are contributing transcripts in the prediction models). Blue nodes are nodes that are connected to other nodes via at least one inflow edge (i.e., they represent proteins, and optionally also transcripts if they also have outward edges). Orange edges represent positive coefficients of the transcripts to the target proteins in the elastic net models; gray edges represent negative coefficients. All edges are directed from transcript to protein, and the widths of the edges are scaled by the weight. D. A highly connected subgraph of mitochondrial ribosome subunits containing 73 nodes and 834 edges. E. Persistent community detection and network representation of preferential node connections, showing a hierarchical relationship between the 28S and 39S subcomplex with the assembled 55S mitochondrial ribosome. F. Network representation of hub nodes defined as 15% of nodes ranked by betweenness centrality, which predicts a potential role of LACTB as a critical hub that lies upstream of multiple large and small mitochondrial ribosomal protein subunits. Node colors represent the pie chart diagram of the corresponding GO biological process described in the table. SHAP values of three proteins (MRPL20, MRPL19, MRPS34) are highlighted showing top model contributors.

See this image and copyright information in PMC

Cited by

Proteome-wide copy-number estimation from transcriptomics.
Sweatt AJ, Griffiths CD, Groves SM, Paudel BB, Wang L, Kashatus DF, Janes KA. Sweatt AJ, et al. Mol Syst Biol. 2024 Nov;20(11):1230-1256. doi: 10.1038/s44320-024-00064-3. Epub 2024 Sep 27. Mol Syst Biol. 2024. PMID: 39333715 Free PMC article.
Proteomics applications in next generation induced pluripotent stem cell models.
Manda V, Pavelka J, Lau E. Manda V, et al. Expert Rev Proteomics. 2024 Apr;21(4):217-228. doi: 10.1080/14789450.2024.2334033. Epub 2024 Mar 27. Expert Rev Proteomics. 2024. PMID: 38511670 Free PMC article. Review.
An Extensive Atlas of Proteome and Phosphoproteome Turnover Across Mouse Tissues and Brain Regions.
Li W, Dasgupta A, Yang K, Wang S, Hemandhar-Kumar N, Yarbro JM, Hu Z, Salovska B, Fornasiero EF, Peng J, Liu Y. Li W, et al. bioRxiv [Preprint]. 2024 Oct 17:2024.10.15.618303. doi: 10.1101/2024.10.15.618303. bioRxiv. 2024. Update in: Cell. 2025 Apr 17;188(8):2267-2287.e21. doi: 10.1016/j.cell.2025.02.021. PMID: 39464138 Free PMC article. Updated. Preprint.
Solid stress compression enhances breast cancer cell migration through the upregulation of Interleukin-6.
Azizan F, Sheriff RS, Goh CJH, Chiam KH, Koh CG. Azizan F, et al. Front Cell Dev Biol. 2025 Apr 30;13:1541953. doi: 10.3389/fcell.2025.1541953. eCollection 2025. Front Cell Dev Biol. 2025. PMID: 40371393 Free PMC article.
Turnover atlas of proteome and phosphoproteome across mouse tissues and brain regions.
Li W, Dasgupta A, Yang K, Wang S, Hemandhar-Kumar N, Chepyala SR, Yarbro JM, Hu Z, Salovska B, Fornasiero EF, Peng J, Liu Y. Li W, et al. Cell. 2025 Apr 17;188(8):2267-2287.e21. doi: 10.1016/j.cell.2025.02.021. Epub 2025 Mar 20. Cell. 2025. PMID: 40118046

See all "Cited by" articles

References

1. Gygi SP, Rochon Y, Franza BR, Aebersold R. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 1999;19: 1720–1730. doi: 10.1128/MCB.19.3.1720 - DOI - PMC - PubMed
1. Liu Y, Beyer A, Aebersold R. On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell. 2016;165: 535–550. doi: 10.1016/j.cell.2016.03.014 - DOI - PubMed
1. Vogel C, Marcotte EM. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet. 2012;13: 227–232. doi: 10.1038/nrg3185 - DOI - PMC - PubMed
1. Franks A, Airoldi E, Slavov N. Post-transcriptional regulation across human tissues. PLoS Comput Biol. 2017;13: e1005535. doi: 10.1371/journal.pcbi.1005535 - DOI - PMC - PubMed
1. Upadhya SR, Ryan CJ. Experimental reproducibility limits the correlation between mRNA and protein abundances in tumour proteomic profiles. Systems Biology; 2021. Sep. doi: 10.1101/2021.09.22.461108 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners

Affiliations

Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous