Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023:28:145-156.
doi: 10.1142/9789811270611_0014.

Graph algorithms for predicting subcellular localization at the pathway level

Affiliations

Graph algorithms for predicting subcellular localization at the pathway level

Chris S Magnano et al. Pac Symp Biocomput. 2023.

Abstract

Protein subcellular localization is an important factor in normal cellular processes and disease. While many protein localization resources treat it as static, protein localization is dynamic and heavily influenced by biological context. Biological pathways are graphs that represent a specific biological context and can be inferred from large-scale data. We develop graph algorithms to predict the localization of all interactions in a biological pathway as an edge-labeling task. We compare a variety of models including graph neural networks, probabilistic graphical models, and discriminative classifiers for predicting localization annotations from curated pathway databases. We also perform a case study where we construct biological pathways and predict localizations of human fibroblasts undergoing viral infection. Pathway localization prediction is a promising approach for integrating publicly available localization data into the analysis of large-scale biological data.

Keywords: Probabilistic graphical model; graph neural network; spatial proteomics.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of the pathway localization prediction experimental workflow.
Fig. 2.
Fig. 2.
Overview of neural network architecture for graph neural networks. The number of graph layers (convolutional depth) and number of fully connected layers (linear depth) are hyperparameters. |N| is the number of nodes in the input pathway. |F| is the number of input features for each node.
Fig. 3.
Fig. 3.
Multiclass F1 score of predictive performance on PathBank localizations across all 427 considered PathBank pathways. Scores are calculated per pathway; the distribution of scores is shown for each model.
Fig. 4.
Fig. 4.
Multiclass F1 score of predictive performance on Reactome localizations across all 918 considered Reactome pathways. Scores are calculated per pathway; the distribution of scores is shown for each model.
Fig. 5.
Fig. 5.
Multiclass F1 score of the GAT model on spatial MS data of viral infection at 120hpi. Performance is shown in each scenario for the 50 top pathways created from a parameter sweep. The baseline model always predicts the most common localization in the training dataset.

References

    1. Lundberg E and Borner GHH. Spatial proteomics: A powerful discovery tool for cell biology. Nature Reviews Molecular Cell Biology, 20(5):285–302, May 2019. - PubMed
    1. Hung M-C and Link W. Protein localization in disease and therapy. Journal of Cell Science, 124(20):3381, October 2011. - PubMed
    1. Bauer NC et al. Mechanisms regulating protein localization. Traffic, 16(10):1039–1061, 2015. - PubMed
    1. Chautard E et al. MatrixDB, the extracellular matrix interaction database. Nucleic Acids Research, 39(suppl_1):D235–D240, September 2010. - PMC - PubMed
    1. Wiwatwattana N and Kumar A. Organelle DB: a cross-species database of protein localization and function. Nucleic Acids Research, 33(suppl_1):D598–D604, January 2005. - PMC - PubMed

Publication types