. 2023 Oct 3;39(10):btad592.

doi: 10.1093/bioinformatics/btad592.

MuDCoD: multi-subject community detection in personalized dynamic gene networks from single-cell RNA sequencing

Ali Osman Berk Şapcı^{1

2}, Shan Lu³, Shuchen Yan³, Ferhat Ay^{4

5}, Oznur Tastan², Sündüz Keleş^{3

6}

Affiliations

¹ Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, United States.
² Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul 34956, Turkey.
³ Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, United States.
⁴ Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, United States.
⁵ Centers for Autoimmunity, Inflammation and Cancer Immunotherapy, La Jolla Institute for Immunology, La Jolla, CA 92037, United States.
⁶ Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, United States.

PMID: 37740957
PMCID: PMC10564618
DOI: 10.1093/bioinformatics/btad592

MuDCoD: multi-subject community detection in personalized dynamic gene networks from single-cell RNA sequencing

Ali Osman Berk Şapcı et al. Bioinformatics. 2023.

. 2023 Oct 3;39(10):btad592.

doi: 10.1093/bioinformatics/btad592.

Authors

Ali Osman Berk Şapcı^{1

2}, Shan Lu³, Shuchen Yan³, Ferhat Ay^{4

5}, Oznur Tastan², Sündüz Keleş^{3

6}

Affiliations

¹ Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, United States.
² Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul 34956, Turkey.
³ Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, United States.
⁴ Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, United States.
⁵ Centers for Autoimmunity, Inflammation and Cancer Immunotherapy, La Jolla Institute for Immunology, La Jolla, CA 92037, United States.
⁶ Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, United States.

PMID: 37740957
PMCID: PMC10564618
DOI: 10.1093/bioinformatics/btad592

Abstract

Motivation: With the wide availability of single-cell RNA-seq (scRNA-seq) technology, population-scale scRNA-seq datasets across multiple individuals and time points are emerging. While the initial investigations of these datasets tend to focus on standard analysis of clustering and differential expression, leveraging the power of scRNA-seq data at the personalized dynamic gene co-expression network level has the potential to unlock subject and/or time-specific network-level variation, which is critical for understanding phenotypic differences. Community detection from co-expression networks of multiple time points or conditions has been well-studied; however, none of the existing settings included networks from multiple subjects and multiple time points simultaneously. To address this, we develop Multi-subject Dynamic Community Detection (MuDCoD) for multi-subject community detection in personalized dynamic gene networks from scRNA-seq. MuDCoD builds on the spectral clustering framework and promotes information sharing among the networks of the subjects as well as networks at different time points. It clusters genes in the personalized dynamic gene networks and reveals gene communities that are variable or shared not only across time but also among subjects.

Results: Evaluation and benchmarking of MuDCoD against existing approaches reveal that MuDCoD effectively leverages apparent shared signals among networks of the subjects at individual time points, and performs robustly when there is no or little information sharing among the networks. Applications to population-scale scRNA-seq datasets of human-induced pluripotent stem cells during dopaminergic neuron differentiation and CD4+ T cell activation indicate that MuDCoD enables robust inference for identifying time-varying personalized gene modules. Our results illustrate how personalized dynamic community detection can aid in the exploration of subject-specific biological processes that vary across time.

Availability and implementation: MuDCoD is publicly available at https://github.com/bo1929/MuDCoD as a Python package. Implementation includes simulation and real-data experiments together with extensive documentation.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
A schematic for multi-subject dynamic gene networks. A gene network is observed for each subject at each time step among a common set of nodes. The sets of edges vary among both the subjects and the time steps. These networks are estimated from scRNA-seq data and are expected to harbor communities that are conserved at varying levels among the subject and time dimensions. Different colors mark distinct communities, where the nodes (genes) within the same communities are depicted with the same color. MuDCoD assumes that communities change smoothly across both the subject and the time dimensions.

**Figure 2.**
Multi-subject dynamic degree corrected block models (MuS-Dynamic-DCBM) for the two proposed settings. (a) *SSoS setting:* subjects evolve from a common ancestor at each time step t; and only the ancestor’s evolution over time is parameterized. (b) *SSoT setting:* subjects evolve from a common ancestor at $t = 0$ ; and then they evolve independently over time.

**Figure 3.**
Evaluation of the identified communities under two different MuS-Dynamic-DCBM settings: (a) SSoS and (b) SSoT. The simulation parameters were set as follows: the network size is $G = 500$ , the number of class labels $K = 10$ , the in-cluster and out-cluster density parameters $p_{in} = (0.2, 0.4)$ and $p_{out} = (0.1, 0.1)$ , number of subjects $S = 8$ , and the number of time points $T \in {2, 4, 8}$ . x-axis is the number of time points T, and y-axis is the mean ARI of the inferred modules for all subjects and time steps across 100 simulation replicates.

**Figure 4.**
NMI scores between inferred gene modules of donor and time point pairs aggregated across all cell types. (a) and (b) quantify NMI scores between modules of every pair of donors on day-30 and day-52, respectively. (c) displays NMI scores between inferred gene modules of each donor on day-30 and on day-52. The y-axis denotes the percentage of donor pairs.

**Figure 5.**
(a) NMI scores of each donor between the MuDCoD inferred modules on day-30 and on day-52 against the differentiation efficiency. For each cell type (DA, Sert, Epen1), each donor’s modules from day-30 and day-52 were compared with NMI and plotted against donor’s differentiation efficiency. (b) Comparison of the mean NMI scores within and between the low and high differentiation efficiency groups of Epen1 cells. Donor labels for differentiation efficiency were obtained from HipSci Consortium *et al.* (2021). NMI scores between pairs of donors are calculated based on their MuDCoD inferred modules. Differentiation efficiency groups were generated based on the percentiles of differentiation efficiency values across the donors, i.e. 50-th percentile (inclusive) corresponds to the low differentiation efficiency group.

**Figure 6.**
(a) Normalized mutual information scores between pairs of donors based on their gene modules inferred by MuDCoD at each time point. Each data point stands for a donor, and the y-axis denotes the mean of NMI scores between that donor and other donors at the corresponding time point. All the statistically significant comparisons with time point 16-h from the one-sided Wilcoxon rank-sum test are marked (with P-value≤0.05), *** and **** stand for P-value≤0.001 and P-value≤0.0001, respectively. (b) Set of top fifteen frequent significantly enriched biological processes (with the adjusted P-values $\leq 0.05$ ) of most prioritized communities contributed by each donor at different time points. Overall, $79 \times 4 = 316$ gene sets were contributed by 79 donors at 4 time points, and correspondingly, 316 separate enrichment analyses were performed. Displayed are significant biological processes, their corresponding number of appearances in the most prioritized communities of donors, and the minimum P-values of enrichment among those communities.

See this image and copyright information in PMC

References

1. Bassett DS, Porter MA, Wymbs NF et al. Robust detection of dynamic community structure in networks. Chaos 2013;23:013142. - PMC - PubMed
1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 1995;57:289–300.
1. Betzel RF, Bertolero MA, Gordon EM et al. The community structure of functional brain networks exhibits scale-specific patterns of inter- and intra-subject variability. Neuroimage 2019;202:115990. - PMC - PubMed
1. Chen Y-JJ, Friedman BA, Ha C et al. Single-cell RNA sequencing identifies distinct mouse medial ganglionic eminence cell types. Sci Rep 2017;7:45656. - PMC - PubMed
1. Chi Y, Song X, Zhou D et al. Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, Association for Computing Machinery, 2007, 153–62.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MuDCoD: multi-subject community detection in personalized dynamic gene networks from single-cell RNA sequencing

Affiliations

MuDCoD: multi-subject community detection in personalized dynamic gene networks from single-cell RNA sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials