Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks

doi:10.1093/bib/bbad414

. 2023 Sep 22;24(6):bbad414.

doi: 10.1093/bib/bbad414.

Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks

Guo Mao¹, Zhengbin Pang¹, Ke Zuo¹, Qinglin Wang¹, Xiangdong Pei¹, Xinhai Chen¹, Jie Liu^{1

2}

Affiliations

¹ Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China.
² Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China.

PMID: 37985457
PMCID: PMC10661972
DOI: 10.1093/bib/bbad414

Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks

Guo Mao et al. Brief Bioinform. 2023.

. 2023 Sep 22;24(6):bbad414.

doi: 10.1093/bib/bbad414.

Authors

Guo Mao¹, Zhengbin Pang¹, Ke Zuo¹, Qinglin Wang¹, Xiangdong Pei¹, Xinhai Chen¹, Jie Liu^{1

2}

Affiliations

¹ Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, deya, 410073 Changsha, China.
² Laboratory of Software Engineering for Complex System, National University of Defense Technology, deya, 410073 Changsha, China.

PMID: 37985457
PMCID: PMC10661972
DOI: 10.1093/bib/bbad414

Abstract

Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.

Keywords: gene regulatory networks (GRNs); graph convolutional network; graph neural network; link prediction.

PubMed Disclaimer

Figures

**Figure 1**
Overview of GNNLink framework. (A) We consider the inference of GRNs by supervised methods as a linkage prediction problem, where the objective is to identify potential edges based on existing ones. (B) ScRNA-seq expression data imputation. (C) The demonstration of learning node feature, where AGG() denotes an aggregation operation that accumulates the features of other nodes connected to a specific node, such as node 3. D. The general structure of the GNNLink model. It involves three main steps. Firstly, the raw data are preprocessed to prepare it for further analysis. Secondly, the node features are learned, capturing important features and information regarding the genes. Finally, the interaction graph is reconstructed for link prediction, with the regulatory interdependencies between genes and represented by the dot product between them.

formula image — **Figure 1**
Overview of GNNLink framework. (A) We consider the inference of GRNs by supervised methods as a linkage prediction problem, where the objective is to identify potential edges based on existing ones. (B) ScRNA-seq expression data imputation. (C) The demonstration of learning node feature, where AGG() denotes an aggregation operation that accumulates the features of other nodes connected to a specific node, such as node 3. D. The general structure of the GNNLink model. It involves three main steps. Firstly, the raw data are preprocessed to prepare it for further analysis. Secondly, the node features are learned, capturing important features and information regarding the genes. Finally, the interaction graph is reconstructed for link prediction, with the regulatory interdependencies between genes and represented by the dot product between them.

**Figure 2**
Parameter sensitivity analysis. Parameter sensitivity analysis for GNNLink model in terms of (A) number of layers of encoder , (B) dimension of representation and (C) weight factor . The average performance measured by AUROC and AUPRC among seven datasets. Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve.

**Figure 3**
Summary of the GRN prediction performance in the AUROC metric (A) and the AUPRC metric (B). Our evaluation is conducted on seven single-cell RNA sequencing (scRNA-seq) datasets, each comprising four ground-truth networks. The scRNA-seq datasets consist of significantly varying transcription factors (TFs) and the 500 (left) or 1000 (right) most-varying genes. (A) The AUROC values in the heatmap represents the average performance across 50 independent calculations for each dataset. The black squares indicate instances where the performance is poorer than random predictors, as denoted by an AUROC value below 0.5. (B) The AUPRC values in the heatmap also are averaged over 50 calculations for each dataset.

**Figure 4**
The average performance measured by AUROC and AUPRC for different numbers of training samples. Performance of GENELink with a wide range of training set sizes from 10 to 50% on seven benchmark datasets with cell-type-specific networks. Each method is run 10 times on each dataset. Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve.

**Figure 5**
The performance of GNNLink on imputed and unimputed benchmark datasets with cell-type-specific networks. Average GRN inference performance of seven scRNA-seq datasets, i.e. AUROC (left panel) and AUPRC (right panel). We use DeepImpute [46] to impute the zero-entries in the raw seven benchmark datasets.We report the mean variance over 10 repeats. Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve.

See this image and copyright information in PMC

Cited by

Constructing the dynamic transcriptional regulatory networks to identify phenotype-specific transcription regulators.
Guo Y, Xiao Z. Guo Y, et al. Brief Bioinform. 2024 Sep 23;25(6):bbae542. doi: 10.1093/bib/bbae542. Brief Bioinform. 2024. PMID: 39451156 Free PMC article.
DeepGRNCS: deep learning-based framework for jointly inferring gene regulatory networks across cell subpopulations.
Lei Y, Huang XT, Guo X, Hang Katie Chan K, Gao L. Lei Y, et al. Brief Bioinform. 2024 May 23;25(4):bbae334. doi: 10.1093/bib/bbae334. Brief Bioinform. 2024. PMID: 38980373 Free PMC article.
HGATLink: single-cell gene regulatory network inference via the fusion of heterogeneous graph attention networks and transformer.
Sun Y, Gao J. Sun Y, et al. BMC Bioinformatics. 2025 Feb 11;26(1):49. doi: 10.1186/s12859-025-06071-x. BMC Bioinformatics. 2025. PMID: 39934680 Free PMC article.
GRLGRN: graph representation-based learning to infer gene regulatory networks from single-cell RNA-seq data.
Wang K, Li Y, Liu F, Luan X, Wang X, Zhou J. Wang K, et al. BMC Bioinformatics. 2025 Apr 18;26(1):108. doi: 10.1186/s12859-025-06116-1. BMC Bioinformatics. 2025. PMID: 40251476 Free PMC article.
Inferring gene regulatory networks with graph convolutional network based on causal feature reconstruction.
Ji R, Geng Y, Quan X. Ji R, et al. Sci Rep. 2024 Sep 12;14(1):21342. doi: 10.1038/s41598-024-71864-8. Sci Rep. 2024. PMID: 39266676 Free PMC article.

See all "Cited by" articles

References

1. Wu X, Zhou Y. GE-impute: graph embedding-based imputation for single-cell RNA-seq data. Brief Bioinform 2022; 23(5): bbac313. - PubMed
1. Zhao M, He W, Tang J, et al. .. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22(5): bbab009. - PubMed
1. Mao G, Zeng R, Peng J, et al. .. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinform 2022; 23. - PMC - PubMed
1. Zhang S, Stumpf M. Learning cell-specific networks from dynamical single cell data. bioRxiv 2023. https://api.semanticscholar.org/CorpusID:257718292.
1. Jing X, Zhang A, Liu F, Zhang X. STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics 2023; 39(4): btad165. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Wu X, Zhou Y. GE-impute: graph embedding-based imputation for single-cell RNA-seq data. Brief Bioinform 2022; 23(5): bbac313. - PubMed

[2] Wu X, Zhou Y. GE-impute: graph embedding-based imputation for single-cell RNA-seq data. Brief Bioinform 2022; 23(5): bbac313. - PubMed

[3] Zhao M, He W, Tang J, et al. .. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22(5): bbab009. - PubMed

[4] Zhao M, He W, Tang J, et al. .. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22(5): bbab009. - PubMed

[5] Mao G, Zeng R, Peng J, et al. .. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinform 2022; 23. - PMC - PubMed

[6] Mao G, Zeng R, Peng J, et al. .. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinform 2022; 23. - PMC - PubMed

[7] Zhang S, Stumpf M. Learning cell-specific networks from dynamical single cell data. bioRxiv 2023. https://api.semanticscholar.org/CorpusID:257718292.

[8] Zhang S, Stumpf M. Learning cell-specific networks from dynamical single cell data. bioRxiv 2023. https://api.semanticscholar.org/CorpusID:257718292.

[9] Jing X, Zhang A, Liu F, Zhang X. STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics 2023; 39(4): btad165. - PMC - PubMed

[10] Jing X, Zhang A, Liu F, Zhang X. STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics 2023; 39(4): btad165. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks

Affiliations

Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous