Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 12;3(6):100498.
doi: 10.1016/j.crmeth.2023.100498. eCollection 2023 Jun 26.

hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data

Affiliations

hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data

Samuel Morabito et al. Cell Rep Methods. .

Abstract

Biological systems are immensely complex, organized into a multi-scale hierarchy of functional units based on tightly regulated interactions between distinct molecules, cells, organs, and organisms. While experimental methods enable transcriptome-wide measurements across millions of cells, popular bioinformatic tools do not support systems-level analysis. Here we present hdWGCNA, a comprehensive framework for analyzing co-expression networks in high-dimensional transcriptomics data such as single-cell and spatial RNA sequencing (RNA-seq). hdWGCNA provides functions for network inference, gene module identification, gene enrichment analysis, statistical tests, and data visualization. Beyond conventional single-cell RNA-seq, hdWGCNA is capable of performing isoform-level network analysis using long-read single-cell data. We showcase hdWGCNA using data from autism spectrum disorder and Alzheimer's disease brain samples, identifying disease-relevant co-expression network modules. hdWGCNA is directly compatible with Seurat, a widely used R package for single-cell and spatial transcriptomics analysis, and we demonstrate the scalability of hdWGCNA by analyzing a dataset containing nearly 1 million cells.

Keywords: Alzheimer's disease; Autism spectrum disorder; co-expression network; gene network; long-read RNA-seq; microglia; single-cell RNA-seq; single-cell genomics; spatial transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of the hdWGCNA workflow and application in the human prefrontal cortex (A) Schematic overview of the standard hdWGCNA workflow on a scRNA-seq dataset. UMAP plot shows 36,671 cells from 11 cognitively normal donors in the Zhou et al. human prefrontal cortex (PFC) dataset. ASC, astrocytes; EX, excitatory neurons; INH, inhibitory neurons; MG, microglia; ODC, oligodendrocytes; OPC, oligodendrocyte progenitor cells. (B) Density plot showing the distribution of pairwise Pearson correlations between genes from the single-cell (sc) expression matrix and metacell expression matrices with varying values of the K-nearest neighbors parameter K. (C) Expression matrix density (1, sparsity) for the sc, pseudo-bulk (pb), and metacell matrices with varying values of K in each cell type. (D) Heatmap of scaled gene expression for the top five hub genes by kME in INH-M6, EX-M2, ODC-M3, OPC-M2, ASC-M18, and MG-M14. (E) snRNA-seq UMAP colored by module eigengene (ME) for selected modules as in (D). (F) UMAP plot of the ODC co-expression network. Each node represents a single gene, and edges represent co-expression links between genes and module hub genes. Point size is scaled by kME. Nodes are colored by co-expression module assignment. The top two hub genes per module are labeled. Network edges were downsampled for visual clarity. (G) snRNA-seq UMAP as in (A) colored by MEs for the 10 ODC co-expression modules as in (F). (H) Module preservation analysis of the ODC modules in the Morabito et al. human PFC dataset. The module’s size versus the preservation statistic (Z preservation) is shown for each module. Z<5, not preserved; 10>Z5, moderately preserved; Z10, highly preserved.
Figure 2
Figure 2
Runtime, memory usage, and performance of hdWGCNA (A and B) We ran the main co-expression network analysis functions of the hdWGCNA R package on 65,415 neuronal cells in a human brain dataset from 54 samples, and tracked the runtime (A) and memory usage upper bound (B) for different-sized subsets of the data ranging from 1,000 through 50,000 cells. (C) Violin plots showing distributions of EGAD neighbor-voting area under the receiver operating characteristic curve (AUC) scores in each of the cell-type-specific co-expression networks from the human PFC dataset. (D) Violin plots showing distributions of multifunctionality AUC scores in each of the cell-type-specific co-expression networks from the human PFC dataset. ASC, astrocytes; EX, excitatory neurons; INH, inhibitory neurons; MG, microglia; ODC, oligodendrocytes; OPC, oligodendrocyte progenitor cells. (E) Performance of the XGBoost regularized regression models used to predict gene expression based on the expression of the top 10 module hub genes for all 96 co-expression modules from the Zhou et al. human PFC dataset. Violin plots showing the test set root-mean-square error (RMSE) comparing the predicted expression with observed for each gene, split by each co-expression module. Modules are ordered within each cell type from lowest mean RMSE to highest.
Figure 3
Figure 3
Spatial co-expression networks represent regional expression patterns in the mouse brain (A) Visium spatial transcriptomics (ST) in anterior (left, 2,696 spots) and posterior (right, 3,353 spots) mouse brain sections, colored by Louvain clusters annotated by anatomical regions. (B) UMAP plot of the mouse brain ST co-expression network. Each node represents a single gene, and edges represent co-expression links between genes and module hub genes. Point size is scaled by kME. Nodes are colored by co-expression module assignment. The top five hub genes per module are labeled. Network edges were downsampled for visual clarity. (C) ST samples colored by MEs for the 12 spatial co-expression modules. Gray color indicates an ME value less than zero.
Figure 4
Figure 4
Isoform co-expression network analysis reveals fate-specific expression programs in the hippocampal radial glia lineage (A) UMAP plot of cells from the mouse hippocampus ScISOrSeq dataset. Major cell types are labeled and the cells used for co-expression network analysis are colored. This dataset contains expression information for 96,093 isoforms and 31,053 genes in 6,832 cells from one mouse brain sample. ASC, astrocytes; CPX, choroid plexus epithelial cells; EPD, ependymal cells; EX, excitatory neurons; GRN, granule neurons; INH, inhibitory neurons; MAC, macrophages; NPC, neuronal intermediate progenitor cells; MG, microglia; OPC, oligodendrocyte progenitor cells; RGL, radial glia; VASC, vasculature cells. (B) UMAP plot of the radial glia lineage, colored by Monocle 3 pseudotime assignment. Top left, ependymal (EPD) trajectory; top right, astrocyte (ASC) trajectory; bottom left, neuronal intermediate progenitor cell (NPC) trajectory. (C) UMAP plot of the radial glia lineage isoform co-expression network. Each node represents a single isoform, and edges represent co-expression links between isoforms and module hub isoforms. Point size is scaled by kMEiso. Nodes are colored by co-expression module assignment. Network edges were downsampled for visual clarity. (D) Donut chart showing the percentage of genes with one isoform, with multiple isoforms that are all assigned to the same module, and with multiple isoforms that are spread across more than one module. (E) Module eigenisoforms (MEiso) as a function of pseudotime for each co-expression module. For each module, a separate locally estimated scatterplot smoothing (LOESS) regression line is shown for each developmental trajectory. (F) Dot plot showing selected GO term enrichment results for each co-expression module. (G) Gene models for selected isoforms of Gfap, colored by co-expression module assignment. (H) Gene models for selected isoforms of H3f3b, colored by co-expression module assignment. (I) Top: gene models for selected isoforms of Cd9, colored by co-expression module assignment. Bottom: Swan graphical representation of Cd9 alternative splicing isoforms. Splice sites and transcript start/end sites are represented as nodes; introns and exons are represented as connections between nodes. These two isoforms are distinguished by alternative TSS usage. Gene models from the GENCODE VM23 comprehensive transcript set are shown below transcripts in panels (G)–(I).
Figure 5
Figure 5
Co-expression network analysis of inhibitory neurons in Autism spectrum disorder (A) UMAP plot of 121,451 nuclei from the cortex of 22 ASD donors, 24 controls, and eight epilepsy donors profiled with snRNA-seq. Inhibitory neuron subtypes are highlighted. ASC, astrocytes; EX, excitatory neurons; INH, inhibitory neurons; MG, microglia; ODC, oligodendrocytes; OPC, oligodendrocyte progenitor cells. (B) Gene co-expression network derived from inhibitory neurons, represented as a two-dimensional UMAP embedding of the TOM. Nodes represent genes, colored by module assignment. Module hub genes with prior evidence of ASD association from SFARI are labeled. Edges represent co-expression relationships between genes and module hub genes. Network edges were downsampled for visual clarity. (C) Gene overlap analysis comparing ASD-associated genes from SFARI and INH co-expression modules, using Fisher’s exact test. × indicates that the overlap was not significant (FDR >0.05). (D) snRNA-seq UMAP plots as in (A) colored by MEs for INH co-expression modules. (E) Violin plots showing MEs in each INH cluster. Two-sided Wilcoxon test was used to compare ASD versus control samples. Nuclei from epilepsy donors were excluded in this comparison. Not significant (ns), p > 0.05; p0.05; ∗∗p0.01; ∗∗∗p0.001; ∗∗∗∗p0.0001. (F) Selected GO enrichment results for each co-expression module.
Figure 6
Figure 6
Consensus network analysis of microglia in AD (A) Left: table showing the number of samples and the number of microglia nuclei from published AD snRNA-seq datasets used for co-expression network analysis. Right: integrated UMAP plot of nuclei from three snRNA-seq datasets. (B) UMAP plot of microglia, colored by Monocle 3 pseudotime assignment. (C) UMAP plot of the microglia co-expression network. Each node represents a single gene, and edges represent co-expression links between genes and module hub genes. Point size is scaled by kME. Nodes are colored by co-expression module assignment. The top 10 hub genes per module are labeled, as well as additional genes of interest. Network edges were downsampled for visual clarity. (D) Selected Gene Ontology (GO) terms enriched in co-expression modules. Bar plots show the log-scaled enrichment of each term. (E) MEs as a function of pseudotime; points are averaged MEs in 50 pseudotime bins of equal size. Line represents LOESS regression with a 95% confidence interval. (F) Microglia UMAP colored by ME. (G) Differential module eigengene (DME) results in 10 pseudotime bins of equal size. For each pseudotime bin, we performed DME analysis between cells from AD (positive fold change) and control samples. × symbol indicates that the test did not reach significance (Wilcoxon rank-sum test Bonferroni-adjusted p value > 0.05). (H) Top: microglia UMAP colored by AD single-cell disease relevance score (scDRS)Z score. Bottom: scDRS Z score as a function of pseudotime, points are averaged scDRS Z scores in 50 pseudotime bins of equal size. Line represents linear regression with a 95% confidence interval. (I) Heatmap of Pearson correlations of MEs and scDRS Z scores, split by cells from AD and control samples. (J) Abbreviations denote the following brain regions: SFG, superior frontal gyrus; EC, entorhinal cortex; OC, occipital cortex; OTC, occipitotemporal cortex. ∗∗Highly preserved (Z10); moderately preserved (10>Z5); x, not preserved (Z<5).
Figure 7
Figure 7
Projecting bulk RNA-seq co-expression modules into a single-cell dataset (A) UMAP plot of 57,950 nuclei from an snRNA-seq dataset of the human PFC from AD (N = 11) and control (N = 7) PFC samples. Cells are colored by major cell type assignment. (B) Multi-region consensus co-expression modules from Morabito et al. bulk RNA-seq analysis projected into the snRNA-seq dataset as in (A). (C) Co-expression modules from the AMP-AD bulk RNA-seq dataset projected into the snRNA-seq dataset as in (A). CBE, cerebellum; DLPFC, dorsolateral PFC; FP, frontal pole; IFG, inferior frontal gyrus; PHG, parahippocampal gyrus; STG, superior temporal gyrus; TCX, temporal cortex; ASC, astrocytes; EX, excitatory neurons; INH, inhibitory neurons; MG, microglia; ODC, oligodendrocytes; OPC, oligodendrocyte progenitor cells; VASC, vascular cells.

References

    1. Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9:559. doi: 10.1186/1471-2105-9-559. - DOI - PMC - PubMed
    1. Yip A.M., Horvath S. Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinf. 2007;8:22. doi: 10.1186/1471-2105-8-22. - DOI - PMC - PubMed
    1. Langfelder P., Zhang B., Horvath S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics. 2008;24:719–720. doi: 10.1093/bioinformatics/btm563. - DOI - PubMed
    1. Dong J., Horvath S. Understanding network concepts in modules. BMC Syst. Biol. 2007;1:24. doi: 10.1186/1752-0509-1-24. - DOI - PMC - PubMed
    1. Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018;36:411–420. doi: 10.1038/nbt.4096. - DOI - PMC - PubMed

Publication types

MeSH terms