Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 22;24(6):bbad414.
doi: 10.1093/bib/bbad414.

Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks

Affiliations

Predicting gene regulatory links from single-cell RNA-seq data using graph neural networks

Guo Mao et al. Brief Bioinform. .

Abstract

Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful technique for studying gene expression patterns at the single-cell level. Inferring gene regulatory networks (GRNs) from scRNA-seq data provides insight into cellular phenotypes from the genomic level. However, the high sparsity, noise and dropout events inherent in scRNA-seq data present challenges for GRN inference. In recent years, the dramatic increase in data on experimentally validated transcription factors binding to DNA has made it possible to infer GRNs by supervised methods. In this study, we address the problem of GRN inference by framing it as a graph link prediction task. In this paper, we propose a novel framework called GNNLink, which leverages known GRNs to deduce the potential regulatory interdependencies between genes. First, we preprocess the raw scRNA-seq data. Then, we introduce a graph convolutional network-based interaction graph encoder to effectively refine gene features by capturing interdependencies between nodes in the network. Finally, the inference of GRN is obtained by performing matrix completion operation on node features. The features obtained from model training can be applied to downstream tasks such as measuring similarity and inferring causality between gene pairs. To evaluate the performance of GNNLink, we compare it with six existing GRN reconstruction methods using seven scRNA-seq datasets. These datasets encompass diverse ground truth networks, including functional interaction networks, Loss of Function/Gain of Function data, non-specific ChIP-seq data and cell-type-specific ChIP-seq data. Our experimental results demonstrate that GNNLink achieves comparable or superior performance across these datasets, showcasing its robustness and accuracy. Furthermore, we observe consistent performance across datasets of varying scales. For reproducibility, we provide the data and source code of GNNLink on our GitHub repository: https://github.com/sdesignates/GNNLink.

Keywords: gene regulatory networks (GRNs); graph convolutional network; graph neural network; link prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of GNNLink framework. (A) We consider the inference of GRNs by supervised methods as a linkage prediction problem, where the objective is to identify potential edges based on existing ones. (B) ScRNA-seq expression data imputation. (C) The demonstration of learning node feature, where AGG(formula image) denotes an aggregation operation that accumulates the features of other nodes connected to a specific node, such as node 3. D. The general structure of the GNNLink model. It involves three main steps. Firstly, the raw data are preprocessed to prepare it for further analysis. Secondly, the node features are learned, capturing important features and information regarding the genes. Finally, the interaction graph is reconstructed for link prediction, with the regulatory interdependencies between genes formula image and formula image represented by the dot product between them.
Figure 2
Figure 2
Parameter sensitivity analysis. Parameter sensitivity analysis for GNNLink model in terms of (A) number of layers of encoder formula image, (B) dimension of representation formula image and (C) weight factor formula image. The average performance measured by AUROC and AUPRC among seven datasets. Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve.
Figure 3
Figure 3
Summary of the GRN prediction performance in the AUROC metric (A) and the AUPRC metric (B). Our evaluation is conducted on seven single-cell RNA sequencing (scRNA-seq) datasets, each comprising four ground-truth networks. The scRNA-seq datasets consist of significantly varying transcription factors (TFs) and the 500 (left) or 1000 (right) most-varying genes. (A) The AUROC values in the heatmap represents the average performance across 50 independent calculations for each dataset. The black squares indicate instances where the performance is poorer than random predictors, as denoted by an AUROC value below 0.5. (B) The AUPRC values in the heatmap also are averaged over 50 calculations for each dataset.
Figure 4
Figure 4
The average performance measured by AUROC and AUPRC for different numbers of training samples. Performance of GENELink with a wide range of training set sizes from 10 to 50% on seven benchmark datasets with cell-type-specific networks. Each method is run 10 times on each dataset. Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve.
Figure 5
Figure 5
The performance of GNNLink on imputed and unimputed benchmark datasets with cell-type-specific networks. Average GRN inference performance of seven scRNA-seq datasets, i.e. AUROC (left panel) and AUPRC (right panel). We use DeepImpute [46] to impute the zero-entries in the raw seven benchmark datasets.We report the mean formula image variance over 10 repeats. Abbreviations: AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve.

Similar articles

Cited by

References

    1. Wu X, Zhou Y. GE-impute: graph embedding-based imputation for single-cell RNA-seq data. Brief Bioinform 2022; 23(5): bbac313. - PubMed
    1. Zhao M, He W, Tang J, et al. .. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22(5): bbab009. - PubMed
    1. Mao G, Zeng R, Peng J, et al. .. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinform 2022; 23. - PMC - PubMed
    1. Zhang S, Stumpf M. Learning cell-specific networks from dynamical single cell data. bioRxiv 2023. https://api.semanticscholar.org/CorpusID:257718292.
    1. Jing X, Zhang A, Liu F, Zhang X. STGRNS: an interpretable transformer-based method for inferring gene regulatory networks from single-cell transcriptomic data. Bioinformatics 2023; 39(4): btad165. - PMC - PubMed

Publication types

MeSH terms