Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 17;4(6):100794.
doi: 10.1016/j.crmeth.2024.100794. Epub 2024 Jun 10.

GENIX enables comparative network analysis of single-cell RNA sequencing to reveal signatures of therapeutic interventions

Affiliations

GENIX enables comparative network analysis of single-cell RNA sequencing to reveal signatures of therapeutic interventions

Nima Nouri et al. Cell Rep Methods. .

Abstract

Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cellular responses to perturbations such as therapeutic interventions and vaccines. Gene relevance to such perturbations is often assessed through differential expression analysis (DEA), which offers a one-dimensional view of the transcriptomic landscape. This method potentially overlooks genes with modest expression changes but profound downstream effects and is susceptible to false positives. We present GENIX (gene expression network importance examination), a computational framework that transcends DEA by constructing gene association networks and employing a network-based comparative model to identify topological signature genes. We benchmark GENIX using both synthetic and experimental datasets, including analysis of influenza vaccine-induced immune responses in peripheral blood mononuclear cells (PBMCs) from recovered COVID-19 patients. GENIX successfully emulates key characteristics of biological networks and reveals signature genes that are missed by classical DEA, thereby broadening the scope of target gene discovery in precision medicine.

Keywords: CP: systems biology; gene network association; gene program discovery; scRNA-seq; system biology; target identification.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors are employees of Sanofi US.

Figures

None
Graphical abstract
Figure 1
Figure 1
Unveiling overlooked genes beyond DEA (A) Schematic illustration depicting three possible alterations in the expression level of genes upon a stimulus, including upregulation (red circles), downregulation (blue circles), and no variation (white circles). (B) Schematic illustration representing a one-dimensional conventional DEA aiming to distinguish genes with varying expression levels and identify condition-relevant signature genes. (C) Schematic illustration showcasing the potential of devised measurement techniques to complement traditional DEA. (D) Schematic illustration of the GENIX algorithm starting from scRNA-seq data. The gene expression matrix of a given cell type is then utilized to estimate the ICM using the glasso algorithm. This process lays the groundwork to build a gene association network and infer associated bio-topological features, hub genes, and modules. (E) Schematic illustration of the gene-removal impact (GRI) metric calculation. The Jaccard index is calculated to quantify the similarity, considering both shared genes and edges, comparing the inferred networks at states 1 and 2 (panel 1). Genes are iteratively removed from both networks, and the similarity index is recalculated. The impact of individual genes is then determined through the relative comparison of these pre- and post-removal measurements. The GRI is positive if the elimination of a specific gene leads to a more similar state between the two networks (panel 2). The GRI is negative if the elimination of a specific gene causes the two networks to become less similar (panel 3). (F) Five unique pseudo-gene configurations representing alternative transcriptional relationships from synthetic-1 dataset.
Figure 2
Figure 2
ICM identifies genes that are conditionally independent given all other genes in the network (A) Simulated dataset (synthetic-1) resembling the sequential regulatory effect of gene G1 on G2, G2 on G3, and G3 on G4. The expression levels are sampled from a Gaussian distribution. The heatmap displays the expression levels of G1–G4 across 10,000 cells. (B) Pearson correlation coefficients calculated from the simulated dataset from (A), laying the framework to build a gene co-expression network composed of genes G1–G4. The red elements of the Pearson correlation matrix represent the transitive red edges in the network (i.e., false positives). The networks at the bottom of the panel illustrate the correlation thresholding effort to remove the transitive edges. The red legend at the bottom illustrates the range of thresholds used to remove the transitive edges. (C) ICM is estimated for the same simulated dataset from (A), laying the groundwork for the inference of a gene association network without any transitive edges, represented by green zeros in the matrix.
Figure 3
Figure 3
ICM deciphers direct associations from indirect effects in synthetic data (A) First row: unique pseudo-gene configurations representing alternative transcriptional relationships. Second row: set of equations used to simulate pseudo-gene expressions based on pre-defined relationships and characteristics. Third and fourth rows: Pearson correlation matrix used for constructing gene co-expression networks. The red values in the matrix and red edges in the network highlight indirect effects, representing false positives. Fifth and sixth rows: inverse covariance matrix used for constructing gene association networks. The green zeros in the matrix highlight removed indirect effects. (B–D) are similar to (A).
Figure 4
Figure 4
Inferring cell-type-specific gene modules from query data (A) Uniform manifold approximation and projection (UMAP) representation of GSE206265 PBMC single-cell data obtained from healthy females (n = 8), healthy males (n = 8), COVID-19-recovered females (n = 12), and COVID-19-recovered males (n = 12). The groups were vaccinated with the seasonal influenza vaccine, and samples were collected longitudinally at days 0, 1, and 28 post vaccination. Clusters and cell types were color coded according to the published manuscript. (B) Schematic illustration of the module inference process for each group of patients (n = 12; top). The panel showcases only the classical monocyte population. Bottom: heatmap representing the pairwise similarity of in-group inferred modules. First, the Jaccard index was used to evaluate the similarity between in-group modules, producing a square matrix with each module corresponding to a row and column. Next, hierarchical clustering was applied to detect clusters of modules with similar gene compositions. Modules within each cluster were subsequently unified (merged) into single modules, labeled M1–M15, as denoted by the color coding at the top of the heatmap. Additional color keys indicate sampling time points, disease statuses, and gender distinctions, reflecting the demographics from which these modules were inferred. Dendrograms displayed along the rows and columns have been created by employing hierarchical clustering with an average linkage method. (C) Schematic representation for annotating modules by comparing them to the reference functional modules, (left). Modules that have a similarity larger than 25% (red dashed line) with the reference modules are annotated accordingly (right). The facet titles on the top represent the modules annotation: AP, antigen presenting; CTX, cytotoxic; IFN, interferon; INFLM, inflammation; MT, mitochondrial; RP, ribosomal protein; TBD, unannotated modules. The color-coded legend on the bottom represents the cell types from which the modules have been inferred. (D) Number of annotated modules (y axis) in each cell type (x axis) from which they have been inferred. The legend on the right represents the modules annotation similar to (C). (E) Heatmap representing row-scaled module scores (rows) averaged over cells from each cell-type population (columns). The color keys on the left represent the annotated module types and the cell types from which the modules have been inferred. The color key at the top represents the cell type associations with hierarchical clusters.
Figure 5
Figure 5
Discovery of vaccine-induced gene programs in classical monocytes and CD8 central memory cells (A and B) Visualization of classical monocytes (dark orange) and CD8 central memory cells (dark blue) in the context of the inferred interferon (cMon.M15) and cytotoxic (CD8-CM.M7) modules (A) and HLA class-II (cMon.M2) and HLA class-I (CD8-CM.M9) modules (B). (C) Module scores of interferon (cMon.M15) and HLA class-II (cMon.M2) modules in classical monocytes in the HC-Female (n = 8), HC-Male (n = 8), COVR-Female (n = 12), and COVR-Male (n = 12) participants at D0, D1, and D28. Each point represents a participant. (D) Expression levels of selected genes from the interferon (cMon.M15) and HLA class-II (cMon.M2) modules in classical monocytes across different groups and time points. Each circle’s color represents the average expression level of the gene, while its size indicates the percentage of cells expressing that particular gene. (E and F) Similar to (C) and (D), but calculated from CD8 central memory cells using cytotoxic (CD8-CM.M7) and HLA class-I (CD8-CM.M9) modules. In (C) and (E), the upper and lower bounds of the boxplots represent the 75th and 25th percentiles, respectively. The center bars indicate the medians, and the whiskers denote values up to 1.5 interquartile ranges above the 75th or below the 25th percentiles. Data beyond the end of the whiskers are considered outliers. Statistical significance was defined by t test, as ∗p < 0.05.
Figure 6
Figure 6
Identification of vaccine-induced TVGs in classical monocytes (A) Change in gene normalized centrality (y axis; D1 versus D0) is compared against GRI (x axis; D1 versus D0) in inferred networks from classical monocytes in female (top) and male (bottom) COVRs. Each point indicates a TVG labeled by the gene name. The label color indicates the hubness status of the given gene, whether being a hub at D0 (orange), D1 (red), in both D0 and D1 (green), or not a hub gene (a common gene) at any time point (black). The points color shade indicates the log-2-fold change (log2 FC) in the expression level of the given gene, comparing D1 post vaccination versus D0. The point shape indicates the differential expression status of each individual gene, whether being invariant (circle), significantly downregulated (inverted triangle), or significantly upregulated (triangle). The shaded area indicates the region where the removal of the given gene has a low influence on the overall architecture of the inferred networks, quantified with a change less than 3-sigma in either the GRI (x axis) or GCC (y axis) distributions. The x axis values are normalized to their extremum positive and negative values. (B) Expression levels of selected genes in classical monocytes of COVID-19-recovered female (n = 12) and male (n = 12) participants at D0, D1, and D28. Each point represents a participant. The upper and lower bounds of the boxplots represent the 75th and 25th percentiles, respectively. The center bars indicate the medians, and the whiskers denote values up to 1.5 interquartile ranges above the 75th or below the 25th percentiles. Data beyond the end of the whiskers are considered outliers. (C) Volcano plot illustrating DEA comparing D1 post vaccination versus D0 in classical monocytes from female participants (top) and male (bottom) COVRs. Red points represent genes with statistically significant up- or downregulation (adjusted p less than 0.05 and absolute log2 fold change larger than 1), while blue points correspond to genes satisfying only the statistically significant criteria (adjusted p less than 0.05). Black points denote genes that do not meet either of the two criteria. (D) Vaccine-induced gene signatures identified using both the DEA and the network-based comparative model in female and male COVRs. The dashed red line indicates signatures identified by both methodologies as differentially expressed genes (DEGs) or TVGs. Note: the GBP1, 2, 4, and 5 genes have not been removed from the (A) to highlight the complementary nature of DEA and network-based comparative model in identifying perturbation-induced signatures. (E) Scatterplot illustrating the relationship between topological variation scores (TVSs; y axis) and log2 fold expression changes (log2 FC; x axis) for classical monocytes comparing post-vaccination D1 to baseline (D0) in female participants (left) and male (right) COVRs. TVGs are labeled and marked with red diamonds.

Similar articles

Cited by

References

    1. Mohs R.C., Greig N.H. Drug discovery and development: Role of basic biological research. Alzheimers Dement. 2017;3:651–657. - PMC - PubMed
    1. Jenwitheesuk E., Horst J.A., Rivas K.L., Van Voorhis W.C., Samudrala R. Novel paradigms for drug discovery: computational multitarget screening. Trends Pharmacol. Sci. 2008;29:62–71. - PMC - PubMed
    1. Yofe I., Dahan R., Amit I. Single-cell genomic approaches for developing the next generation of immunotherapies. Nat. Med. 2020;26:171–177. - PubMed
    1. Trombetta J.J., Gennert D., Lu D., Satija R., Shalek A.K., Regev A. Preparation of single-cell RNA-seq libraries for next generation sequencing. Curr. Protoc. Mol. Biol. 2014;107:22.21–24.22.17. - PMC - PubMed
    1. Svensson V., Vento-Tormo R., Teichmann S.A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 2018;13:599–604. - PubMed

Substances

LinkOut - more resources