Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 3;39(10):btad592.
doi: 10.1093/bioinformatics/btad592.

MuDCoD: multi-subject community detection in personalized dynamic gene networks from single-cell RNA sequencing

Affiliations

MuDCoD: multi-subject community detection in personalized dynamic gene networks from single-cell RNA sequencing

Ali Osman Berk Şapcı et al. Bioinformatics. .

Abstract

Motivation: With the wide availability of single-cell RNA-seq (scRNA-seq) technology, population-scale scRNA-seq datasets across multiple individuals and time points are emerging. While the initial investigations of these datasets tend to focus on standard analysis of clustering and differential expression, leveraging the power of scRNA-seq data at the personalized dynamic gene co-expression network level has the potential to unlock subject and/or time-specific network-level variation, which is critical for understanding phenotypic differences. Community detection from co-expression networks of multiple time points or conditions has been well-studied; however, none of the existing settings included networks from multiple subjects and multiple time points simultaneously. To address this, we develop Multi-subject Dynamic Community Detection (MuDCoD) for multi-subject community detection in personalized dynamic gene networks from scRNA-seq. MuDCoD builds on the spectral clustering framework and promotes information sharing among the networks of the subjects as well as networks at different time points. It clusters genes in the personalized dynamic gene networks and reveals gene communities that are variable or shared not only across time but also among subjects.

Results: Evaluation and benchmarking of MuDCoD against existing approaches reveal that MuDCoD effectively leverages apparent shared signals among networks of the subjects at individual time points, and performs robustly when there is no or little information sharing among the networks. Applications to population-scale scRNA-seq datasets of human-induced pluripotent stem cells during dopaminergic neuron differentiation and CD4+ T cell activation indicate that MuDCoD enables robust inference for identifying time-varying personalized gene modules. Our results illustrate how personalized dynamic community detection can aid in the exploration of subject-specific biological processes that vary across time.

Availability and implementation: MuDCoD is publicly available at https://github.com/bo1929/MuDCoD as a Python package. Implementation includes simulation and real-data experiments together with extensive documentation.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
A schematic for multi-subject dynamic gene networks. A gene network is observed for each subject at each time step among a common set of nodes. The sets of edges vary among both the subjects and the time steps. These networks are estimated from scRNA-seq data and are expected to harbor communities that are conserved at varying levels among the subject and time dimensions. Different colors mark distinct communities, where the nodes (genes) within the same communities are depicted with the same color. MuDCoD assumes that communities change smoothly across both the subject and the time dimensions.
Figure 2.
Figure 2.
Multi-subject dynamic degree corrected block models (MuS-Dynamic-DCBM) for the two proposed settings. (a) SSoS setting: subjects evolve from a common ancestor at each time step t; and only the ancestor’s evolution over time is parameterized. (b) SSoT setting: subjects evolve from a common ancestor at t=0; and then they evolve independently over time.
Figure 3.
Figure 3.
Evaluation of the identified communities under two different MuS-Dynamic-DCBM settings: (a) SSoS and (b) SSoT. The simulation parameters were set as follows: the network size is G=500, the number of class labels K=10, the in-cluster and out-cluster density parameters pin=(0.2,0.4) and pout=(0.1,0.1), number of subjects S=8, and the number of time points T{2,4,8}. x-axis is the number of time points T, and y-axis is the mean ARI of the inferred modules for all subjects and time steps across 100 simulation replicates.
Figure 4.
Figure 4.
NMI scores between inferred gene modules of donor and time point pairs aggregated across all cell types. (a) and (b) quantify NMI scores between modules of every pair of donors on day-30 and day-52, respectively. (c) displays NMI scores between inferred gene modules of each donor on day-30 and on day-52. The y-axis denotes the percentage of donor pairs.
Figure 5.
Figure 5.
(a) NMI scores of each donor between the MuDCoD inferred modules on day-30 and on day-52 against the differentiation efficiency. For each cell type (DA, Sert, Epen1), each donor’s modules from day-30 and day-52 were compared with NMI and plotted against donor’s differentiation efficiency. (b) Comparison of the mean NMI scores within and between the low and high differentiation efficiency groups of Epen1 cells. Donor labels for differentiation efficiency were obtained from HipSci Consortium et al. (2021). NMI scores between pairs of donors are calculated based on their MuDCoD inferred modules. Differentiation efficiency groups were generated based on the percentiles of differentiation efficiency values across the donors, i.e. 50-th percentile (inclusive) corresponds to the low differentiation efficiency group.
Figure 6.
Figure 6.
(a) Normalized mutual information scores between pairs of donors based on their gene modules inferred by MuDCoD at each time point. Each data point stands for a donor, and the y-axis denotes the mean of NMI scores between that donor and other donors at the corresponding time point. All the statistically significant comparisons with time point 16-h from the one-sided Wilcoxon rank-sum test are marked (with P-value≤0.05), *** and **** stand for P-value≤0.001 and P-value≤0.0001, respectively. (b) Set of top fifteen frequent significantly enriched biological processes (with the adjusted P-values 0.05) of most prioritized communities contributed by each donor at different time points. Overall, 79×4=316 gene sets were contributed by 79 donors at 4 time points, and correspondingly, 316 separate enrichment analyses were performed. Displayed are significant biological processes, their corresponding number of appearances in the most prioritized communities of donors, and the minimum P-values of enrichment among those communities.

Similar articles

Cited by

References

    1. Bassett DS, Porter MA, Wymbs NF et al. Robust detection of dynamic community structure in networks. Chaos 2013;23:013142. - PMC - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 1995;57:289–300.
    1. Betzel RF, Bertolero MA, Gordon EM et al. The community structure of functional brain networks exhibits scale-specific patterns of inter- and intra-subject variability. Neuroimage 2019;202:115990. - PMC - PubMed
    1. Chen Y-JJ, Friedman BA, Ha C et al. Single-cell RNA sequencing identifies distinct mouse medial ganglionic eminence cell types. Sci Rep 2017;7:45656. - PMC - PubMed
    1. Chi Y, Song X, Zhou D et al. Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, Association for Computing Machinery, 2007, 153–62.

Publication types