Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 23;16(1):4785.
doi: 10.1038/s41467-025-60154-0.

Worm Perturb-Seq: massively parallel whole-animal RNAi and RNA-seq

Affiliations

Worm Perturb-Seq: massively parallel whole-animal RNAi and RNA-seq

Hefei Zhang et al. Nat Commun. .

Abstract

Transcriptomes provide highly informative molecular phenotypes that, combined with gene perturbation, can connect genotype to phenotype. An ultimate goal is to perturb every gene and measure transcriptome changes, however, this is challenging, especially in whole animals. Here, we present 'Worm Perturb-Seq (WPS)', a method that provides high-resolution RNA-sequencing profiles for hundreds of replicate perturbations at a time in living animals. WPS introduces multiple experimental advances combining strengths of Caenhorhabditis elegans genetics and multiplexed RNA-sequencing with a novel analytical framework, EmpirDE. EmpirDE leverages the unique power of large transcriptomic datasets and improves statistical rigor by using gene-specific empirical null distributions to identify DEGs. We apply WPS to 103 nuclear hormone receptors (NHRs) and find a striking 'pairwise modularity' in which pairs of NHRs regulate shared target genes. We envision the advances of WPS to be useful not only for C. elegans, but broadly for other models, including human cells.

PubMed Disclaimer

Conflict of interest statement

Competing interests: M.G. and A.K. are co-founders of Via Scientific, Inc., a UMass Chan Medical School spin-off. A.K. is a board member of the company. M.G., A.K., and O.Y. have equity in the company. They ensure that steps have been taken to prevent these affiliations from affecting analysis integrity and are dedicated to upholding research transparency and integrity. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Worm Perturb-Seq (WPS) overview.
a Overview of the WPS pipeline. This figure was created in BioRender. lee, y. (2025) https://BioRender.com/u29b568. b WPS optimization highlights.
Fig. 2
Fig. 2. Development of WPS and data quality control.
a Comparison of the C. elegans transcriptome across developmental stages. The Pearson correlation coefficient (PCC) was calculated by the WPS profiles of animals fed vector control bacteria and collected at different time points post L1. b Subsampling analysis of WPS profiles, combining data from 62 to 65 h for each replicate shown in Fig. 2a. The plot shows the fraction of genes quantified versus sequencing depth. Genes whose expression levels in subsampling fall within roughly ±30% interval of the reference value were considered as quantified (for detailed definition, see Supplementary Methods). Error bar shows the mean values (± s.d.) from three replicates of the subsampling profile. c Representative Principal Component Analysis (PCA) results for gene expression profiles of perturbations without (gna-1 on top panel) and with (iars-2 in bottom panel) a low-quality outlier replicate. Red and gray dots indicate RNAi and control samples, respectively. d An example showing reads mapped to the reverse strand of the RNAi target gene (nhr-7) and a decrease in mRNA reads at the 3’ end. The reads mapped to nhr-7 gene locus were visualized by Integrative Genomics Viewer (IGV). nhr-7 RNAi was compared to vector control RNAi and another RNAi condition (nhr-14). Quantification of anti-sense RNA reads in a Sanger-sequenced (e) and an unvalidated (f) WPS sequencing library. Row names represent the intended RNAi gene (three replicates each) and column names represent the actual knocked down genes. Row names in red indicate wrong RNAi clones. Values are the log2(Count-Per-Million (CPM) + 0.01) of the reads mapped to the reverse strand of each gene in the columns. g Log2(Fold Change (FC)) of the RNAi targeted gene expression for the WPS sequencing library shown in (f). Pound key (#) indicates not detected. Each bar represents the mean (±s.d.) and each dot represents one biological replicate (n = 3). Source data are provided as a Source Data file.
Fig. 3
Fig. 3. EmpirDE analysis framework reveals systematic anti-conservative P values caused by a deviation from the expected distribution of Wald test statistics.
a Distribution of the number of DEGs in non-targeting perturbations (NTPs) identified by DESeq2 analysis (FC > 2, adjusted P value (Padj) < 0.01). Examples of a control-outlier (b) and noisy (c) gene. Each bar plot shows the expression levels of the gene of interest in a WPS sequencing library that includes RNAi perturbations and vector control conditions. d Schematic illustrating the EmpirDE framework. The zoom-in windows show two example genes shown in (b, c). For (b–d), each dot in the bar plot represents one biological replicate (n = 2 or 3). e The number of control-outlier genes per WPS sequencing library. Distribution of fitted means (f) and standard deviations (g) for all genes in the empirical null modeling. The blue dashed line shows the values for theoretical null. h Examples showing the random fluctuation of the mean for two genes. i Distribution of fitted standard deviation using simulated WPS data with (right) and without (left) adding a random fluctuation of the mean in the simulation. j Scatter plot showing the fitted standard deviation of each gene against expression levels in wild-type condition. Genes exhibiting a standard deviation greater than 2 are colored based on their WormCat categories. k Gene Set Enrichment Analysis (GSEA) result using the fitted standard deviation as the ranking metric (Supplementary Methods). WormCat Level 3 was used for the analysis. NES Normalized Enrichment Score.
Fig. 4
Fig. 4. EmpirDE rigorously controls FDR.
a, b Benchmarking the performance of EmpirDE analysis framework. The observed False Discovery Proportion (FDP) is compared to target FDR (a) and power (b). The full metabolic-gene WPS dataset (3691 samples) was simulated 10 times with random mean fluctuation (Δμ) to produce the error bars of each metric. FDP and power were measured based on a pooled set of 117,096 simulated DE changes in all conditions (Supplementary Methods). c Benchmarking FDR control of different multiple testing adjustment strategies. The data points and error bars in (ac) indicate mean ± s.d. from 10 simulations. d Evaluating false discoveries using NTP experiments. We estimated false discoveries using the 90% quantile of the numbers of DEG across 71 NTP conditions (shown in the heatmap color and numbers). The 90% quantile represents the value below which 90% of the data points fall, effectively capturing the upper range of typical DEG counts while excluding the most extreme outliers. The red lines show the threshold boundary for five false positive DEG calls. e Number of DEGs (defined by FDR < 0.1 and FC > 1.5 using EmpirDE) for 36 perturbations that were repeated by a second WPS experiment. f Fraction of unreproducible DEGs for EmpirDE versus DESeq2 analysis. Unreproducible DEGs were defined by genes that are called as DEG in one experiment (FDR < 0.1, FC > 1.5) but confidently non-DEG in the other (FC < 1.1 or show a different FC direction). The red dashed line shows the theoretical FDR (FDR = 0.1). Comparison of log2(FC) measured in two independent experiments for representative RNAi with either high (g) or moderate (h) number of DEGs. The green dashed line indicates the diagonal (y = x).
Fig. 5
Fig. 5. A NHR gene regulatory network.
a Expression levels of nhr genes in adult animals. The TPM was quantified by the reference profile used in Supplementary Fig. 1c. b Tissue expression of nhr genes based on an adult stage single-cell RNA-seq data. The maximal TPM of hypodermis and intestine expression is shown in the plot. c Summary of NHR WPS experiments. d Distribution of the number of DEGs in NHR perturbations (i.e., k-out) and the number of NHRs regulating the same gene (i.e., k-in). The red dashed line indicates the responsiveness threshold ( ≥ 5 DEGs). e Functional enrichment analysis of NHR RNAi conditions. Only the 54 NHRs (55 perturbations because nhr-25 was perturbed at both L1 and L2 stages) with more than 10 DEGs were analyzed to ensure power. Conditions without significant (Padj > 0.05) enrichment are not shown in the plot. The relative enrichment is defined as the −log10(Padj) normalized by its maximum in a given RNAi condition. f Proportion of testable RNAi (i.e. >10 DEGs) that showed significant enrichment (Padj < 0.1) in WormCat Level 1. The P values in (e, f) were derived from one-sided hypergeometric test with multiple testing adjusted by Benjamini-Hochberg method. Visualization of hub genes (i.e., regulated by >5 NHRs) of the stress response (g) and metabolic (h) categories. i Comparison of the activator score for metabolic (x-axis) and stress response (y-axis) DEGs. NHR perturbations with ≥5 DEGs in the corresponding categories were analyzed. j Overall comparison of activator scores for different DEG categories. Each dot represents an activator score for one NHR, calculated using DEGs from the category shown on the X-axis (All: n = 83; Stress: n = 26; Metabolic: n = 32). Boxes show the IQR (25th–75th percentiles) with median line; whiskers extend to 1.5×IQR. Two-tailed Wilcoxon tests were performed to calculate the P values. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. The pairwise modularity of NHRs.
a Heatmap depicting perturbation-perturbation similarity of DEG profiles for the NHR perturbations. The perturbation-perturbation similarity was defined by cosine similarity of the filtered log2(FC) profile. The filtered log2(FC) was derived by masking the log2(FC) values of genes that are not called as DEGs (FDR < 0.1, FC > 1.5) to zero. b Visualization of gene expression changes in selected NHR pairs. The gene expression change was measured by the corrected Wald statistic. Rows are the union DEGs of these selected NHR perturbations. c Randomization test of the pairwise modularity of NHR gene family. The schematic shows the design of the randomization test. Histogram shows the average silhouette score of pairs in 10,000 randomizations. The red line indicates the observed score from real data. The NHR GRN was randomized by swapping the network edges while preserving the network structure and properties, such as in- and out-degrees (Supplementary Methods). The gene-gene correlation was not preserved in this randomization to fully randomize the GRN. d Heatmap depicting protein sequence similarity (percent identity) for the DNA binding domain (DBD) of NHRs. The heatmap was clustered using distance matrix generated by Clustal Omega online tool from EMBL-EBI (Supplementary Methods). Scatter plots showing the comparison between perturbation-perturbation similarity and sequence similarity of DBD (e) and full-length protein (f). Each data point indicates a pair of NHRs and selected pairs are labeled. g Scatter plot and randomization test for the associations between perturbation-perturbation similarity and NHR coexpression. Coexpression was measured based on the median Pearson Correlation Coefficient (PCC) of nhr gene expression in a compendium of C. elegans gene expression data across various conditions (Supplementary Methods). The median coexpression level of pairs with a cosine similarity greater than 0.2 (red region in (g)) is calculated and compared with that from randomized data (Supplementary Methods). The histogram shows that the median from real data (red line) is significantly greater than that from randomized data, indicating a statistically significant association between NHR coexpression and perturbation-perturbation similarity. Source data are provided as a Source Data file.

Update of

References

    1. Hughes, T. R. et al. Functional discovery via a compendium of expression profiles. Cell102, 109–126 (2000). - DOI - PubMed
    1. Giaever, G. et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature418, 387–391 (2002). - DOI - PubMed
    1. Przybyla, L. & Gilbert, L. A. A new era in functional genomics screens. Nat. Rev. Genet.23, 89–103 (2022). - DOI - PubMed
    1. Ideker, T. et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science292, 929–934 (2001). - DOI - PubMed
    1. Kemmeren, P. et al. Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors. Cell157, 740–752 (2014). - DOI - PubMed

MeSH terms

Substances

LinkOut - more resources