Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 11;49(18):e104.
doi: 10.1093/nar/gkab601.

Codependency and mutual exclusivity for gene community detection from sparse single-cell transcriptome data

Affiliations

Codependency and mutual exclusivity for gene community detection from sparse single-cell transcriptome data

Natsu Nakajima et al. Nucleic Acids Res. .

Abstract

Single-cell RNA-seq (scRNA-seq) can be used to characterize cellular heterogeneity in thousands of cells. The reconstruction of a gene network based on coexpression patterns is a fundamental task in scRNA-seq analyses, and the mutual exclusivity of gene expression can be critical for understanding such heterogeneity. Here, we propose an approach for detecting communities from a genetic network constructed on the basis of coexpression properties. The community-based comparison of multiple coexpression networks enables the identification of functionally related gene clusters that cannot be fully captured through differential gene expression-based analysis. We also developed a novel metric referred to as the exclusively expressed index (EEI) that identifies mutually exclusive gene pairs from sparse scRNA-seq data. EEI quantifies and ranks the exclusive expression levels of all gene pairs from binary expression patterns while maintaining robustness against a low sequencing depth. We applied our methods to glioblastoma scRNA-seq data and found that gene communities were partially conserved after serum stimulation despite a considerable number of differentially expressed genes. We also demonstrate that the identification of mutually exclusive gene sets with EEI can improve the sensitivity of capturing cellular heterogeneity. Our methods complement existing approaches and provide new biological insights, even for a large, sparse dataset, in the single-cell analysis field.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Overview of the generation of a feature matrix from scRNA-seq data. (i, ii) If a gene-cell expression matrix is provided, EEI is calculated, and highly mutually exclusive gene pairs are extracted. (iii) The feature matrix is generated by merging the expression ratio matrix for EEI pairs with the gene-cell expression matrix. (iv) Dimension reduction is performed using SVD and UMAP with the feature matrix as an input. (B) Summary of the six scRNA-seq datasets. This contains the number of genes that expressed in at least one cell, the number of cells, the number of total reads, the units of transcript counts and the reference.
Figure 2.
Figure 2.
Comparison of the performances of the four methods with common gene sets from the glioblastoma scRNA-seq dataset. r0.1 represents the synthetic dataset in which 90formula image of read counts present zero expression compared to the original data. The average AUROC (A), AUPR (B) and AP (C) were calculated by repeating each simulation 10 times.
Figure 3.
Figure 3.
Comparison of the performances of the four methods with gold standard gene pairs. The AUPR (A) and AP (B) were calculated using the NTZ dataset and the AUPR (C) and AP (D) were calculated using the HVG dataset.
Figure 4.
Figure 4.
Comparison of the EEI (i), Pearson correlation (ii), minet (iii), GENIE3 (iv) and PIDC (v) UMAP results using human ES cell (A) and PBMC_CELseq2 (B) scRNA-seq datasets.
Figure 5.
Figure 5.
Comparative analysis of coexpression networks at different time points. The coexpression network constructed at each time point (A) and degree distribution (B). The blue and yellow nodes represent the high- and low-degree nodes, respectively. After decomposition of the coexpression networks, four communities were extracted for each sample, and the values between the communities at 0 and 12 hours represent the Szymkiewicz-Simpson coefficients (C).

Similar articles

Cited by

References

    1. Akutsu T., Miyano S., Kuhara S.. Inferring qualitative relations in genetic networks and metabolic pathways. Bioinformatics. 2000; 16:727–734. - PubMed
    1. Hickman G.J., Hodgman T.C.. Inference of gene regulatory networks using boolean-network inference methods. J. Bioinform. Comput. B. 2009; 7:1013–1029. - PubMed
    1. Barman S., Kwon Y.K.. A Boolean network inference from time-series gene expression data using a genetic algorithm. Bioinformatics. 2018; 34:1927–1933. - PubMed
    1. Chen L., Kulasiri D., Samarasinghe S.. A novel data-driven boolean model for genetic regulatory networks. Front. Physiol. 2018; 25:1328. - PMC - PubMed
    1. Penfold C.A., Shifaz A., Brown P.E., Nicholson A., Wild D.L.. CSI: a nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data. Stat. Appl. Genet. Mol. Biol. 2015; 14:307–310. - PubMed

Publication types

MeSH terms