Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 2;16(1):7118.
doi: 10.1038/s41467-025-62252-5.

Epiregulon: Single-cell transcription factor activity inference to predict drug response and drivers of cell states

Affiliations

Epiregulon: Single-cell transcription factor activity inference to predict drug response and drivers of cell states

Tomasz Włodarczyk et al. Nat Commun. .

Abstract

Transcription factors (TFs) and transcriptional coregulators are emerging therapeutic targets. Gene regulatory networks (GRNs) can evaluate pharmacological agents and identify drivers of disease, but methods that rely solely on gene expression often neglect post-transcriptional modulation of TFs. We present Epiregulon, a method that constructs GRNs from single-cell ATAC-seq and RNA-seq data for accurate prediction of TF activity. This is achieved by considering the co-occurrence of TF expression and chromatin accessibility at TF binding sites in each cell. ChIP-seq data allows motif-agonistic activity inference of transcriptional coregulators or TF harboring neomorphic mutations. Epiregulon accurately predicted the effects of AR inhibition across different drug modalities including an AR antagonist and an AR degrader, delineated the mechanisms of a SMARCA4 degrader by identifying context-dependent interaction partners, and prioritized drivers of lineage reprogramming and tumorigenesis. By mapping gene regulation across various cellular contexts, Epiregulon can accelerate the discovery of therapeutics targeting transcriptional regulators.

PubMed Disclaimer

Conflict of interest statement

Competing interests: T.W., A.L., D.W., M.S., X. Ye, S.T., K.S., L.W., J.T., S.Y.C., T.K., A. Chlebowski, A.W., W.Z., Y.W., Y.G., L.F.C., B.D., A.H., M. He, A.Chibly, Y.L., C.M., M.Hafner., C.W.S., R.Y., S.X. and X.Yao are or were employees of Genentech Inc. or Roche. A.L., D.W., M.S., X. Ye, K.S., L.W., J.T., A.W., W.Z., Y.W., L.F.C., B.D., A.H., M. He, A.Chibly, Y.L., C.M., M.Hafner, C.W.S., R.Y., S.X. and X.Yao were given Roche stocks.

Figures

Fig. 1
Fig. 1. Epiregulon constructs GRNs to infer regulator activity at the single-cell level.
a Epiregulon can infer regulator activity for lineage development, drug perturbations, motif-lacking co-regulators or regulators harboring neomorphic mutations. Correlation and mutual information weight estimation methods are appropriate for the first scenario, whereas co-occurrence is applicable to all cases. b If users provide scRNA-seq and scATAC-seq, Epiregulon can construct GRNs either from ChIP-seq data or motif annotations. Pan-cell-type, tissue-specific and sample-specific ChIP-seq data compiled from ChIP-Atlas and ENCODE are available through the scMultiome package. Epiregulon outputs regulator activity at the single cell level, a pruned and weighted gene regulatory network and differential activity analysis to identify potential drivers of cell states. c For benchmarking, we downloaded the paired scATAC-seq and scRNA-seq PBMC data from 10x Genomics. We identified cell types using SingleR and known marker genes. Shown is the UMAP representation of the various cell types present in the data. d Gene expression of known lineage markers. e Activities of the same lineage markers were calculated using Epiregulon (correlation weight estimation method). f Area under the receiver operating characteristic curve (AUROC) is calculated based on whether TF expression or TF activity can distinguish cells of the matching lineage vs. the remaining cells based on a total of 20 factors. g Gene expression changes after depletion of 7 individual factors (ELK1, GATA3, JUN, NFATC3, NFKB1, STAT3 and MAF) were obtained from the knockTF database and genes with absolute logFC > 0.5 and corrected p-value < 0.05 (two-sided moderated t-test, limma) were considered ground truth target genes. GRNs obtained from the shown packages were evaluated for precision and recall of target genes. h Run time and memory use of the GRN construction from the PBMC data were evaluated with 64GB and 20 cores on the high-performance computing (HPC) cluster. In the case of GRaNIE, the memory allocation needs to be increased to 128 GB and for FigR, the memory allocation was increased to 256 GB. Each package was run 5 times. Source data are provided as a Source Data file. Created in BioRender. Yao, X. (2025) https://BioRender.com/x50fdft.
Fig. 2
Fig. 2. Epiregulon predicts the responses of AR-modulating drugs.
a Mechanisms of action of 3 AR-modulating agents. b Six prostate cancer cell lines were treated and profiled for changes in their gene expression and chromatin accessibility by paired scATAC-seq and scRNAseq. Shown is the UMAP representation (5028 VCaP cells, 5958 LNCaP cells, 3568 22Rv1 cells, 945 MDA-PCa-2b cells, 3639 NCI-H660 cells and 3980 DU145 cells). Cells were merged from 2 technical replicates. Cells were treated for 24 h at 1 µM of enzalutamide or ARV-110 or 0.1 µM of SMARCA2_4.1. c Immunoblotting of AR and HDAC as a loading control after 24 h of treatment. This is a representative result from 2 biological replicates. d Chemical structure of SMARCA2_4.1. e Prostate cancer cell lines were treated for 5 days and cell viability was measured by CellTiter-Glo. Dotted line indicates the concentrations used in the scATAC-seq/scRNA-seq experiment. Data are presented as mean ± standard deviation (s.d.) based on 4 biological replicates. f Shown are the AR gene expression and the AR activity computed by Epiregulon (co-occurrence weight estimation method). Numbers of cells are as follows: VCaP (DMSO 1392 cells, Enza 1266 cells, ARV-110 1377 cells, SMARCA2_4.1 993 cells); LNCaP (DMSO 1499 cells, Enza 1966 cells, ARV-110 758 cells, SMARCA2_4.1 1735 cells); 22Rv1 (DMSO 970 cells, Enza 1002 cells, ARV-110 554 cells, SMARCA2_4.1 1042 cells); MDA-PCa-2b (DMSO 306 cells, Enza 76 cells, ARV-110 188 cells, SMARCA2_4.1 375 cells). Boxplots presented as median values ± 25%. Lower whisker is the smallest observation ≥25% quantile −1.5 × interquantile range (IQR). Upper whisker represents the largest observation ≤75% +  1.5 × IQR. g Each cell was identified by the HTO tag corresponding to treatment. A cell was classified into either the DMSO or AR inhibitor-treated group based on AR activity. Bar plots show the median receiver operating characteristic curve (AUROC) in the two sensitive cell lines, LNCaP and VCaP, treated with enzalutamide or ARV-110 for a total of 4 samples. Source data are provided as a Source Data file. Created in BioRender. Yao, X. (2025) https://BioRender.com/x50fdft.
Fig. 3
Fig. 3. Epiregulon infers activity of AR harboring neomorphic mutations.
a Shown are the AR activity estimated from the signature score in Bluemn et al. and its correlation with AR activity estimated by Epiregulon in VCaP cells, which harbor the amplification of the wildtype AR. Numbers of cells are as follows: DMSO 1392 cells, Enza 1266 cells, ARV-110 1377 cells, SMARCA2_4.1 993 cells. Boxplots presented as median values ± 25%. Lower whisker is the smallest observation greater than or equal to 25% quantile −1.5 × interquantile range (IQR). Upper whisker represents the largest observation ≤75% + 1.5 × IQR. b Normalized expression of genes in the Bluemn AR signature for VCaP. c Same as a, but for MDA-PCa-2b, which harbors two mutations in the AR gene and as a result has enhanced specificity for hydrocortisone over 5α-DHT. Numbers of cells are as follows: DMSO 306 cells, Enza 76 cells, ARV-110 188 cells, SMARCA2_4.1 375 cells. Boxplots presented as median values ±25%. Lower whisker is the smallest observation greater than or equal to 25% quantile −1.5 × interquantile range (IQR). Upper whisker represents the largest observation ≤75% + 1.5× IQR. d Same as b, but for MDA-PCa-2b. e Normalized expression of putative AR targets of MDA-PCa-2b as inferred by Epiregulon. f AR occupancy as measured by ChIP-seq at ATAC-peaks containing the regulatory elements mapped to AR target genes in MDA-PCa-2b cells treated with DMSO or the SMARCA2/4 degrader, SMARCA2_4.1. Center represents the center of the regulatory elements. g The ground truth regulator activity was computed using all the publicly available ChIP-seq obtained in LNCaP cells. This activity was then correlated (Pearson’s) with activity computed either from pan-cell-type ChIP-seq (red) or motif annotations (blue) for each of the regulators. h Same as g, but using ChIP-seq generated in MDA-PCa-2b. i Shown is the Jaccard similarity between the target genes derived from LNCaP ChIP-seq vs. target genes derived from pan-cell-type ChIP-seq (red) or motif annotations (blue). j Same as h, but using ChIP-seq generated in MDA-PCa-2b. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Epiregulon uncovers context-dependent interaction partners of SMARCA4.
a Immunoblotting of SMARCA4 after 24 h of treatment and HDAC as a loading control. b SMARCA4 activity computed by Epiregulon for all 6 prostate cell lines after 24 h of treatment. Boxplots presented as median values ± 25%. Lower whisker is the smallest observation ≥25% quantile −1.5 × interquantile range (IQR). Upper whisker represents the largest observation ≤75% + 1.5 × IQR. c Altered regulon size indicates the number of altered target genes mapped to each regulator. Altered genes are defined by genes with logFC > 0.5 and FDR < 0.05 after SMARCA2_4.1 treatment. LogFC in activity indicates the changes in regulator activity estimated by Epiregulon. d ChIP-seq of AR, SMARCA4 and FOXA1 in MDA-PCa-2b treated with the SMARCA2/4 degrader, SMARCA2_4.1 for 24 h at 0.1 µM at SMARCA4 binding sites. e Representative regions of SMARCA4, AR and FOXA1 ChIP-seq in MDA-PCa-2b cells treated with SMARCA2_4.1. f Same as c, but for DU145 cells. g ChIP-seq of SMARCA4, TEAD1 and FOSL1 in DU145 cells treated with the SMARCA2/4 degrader, SMARCA2_4.1 for 24 h at 0.1 µM. h Representative regions of SMARCA4, TEAD1 and FOSL1 ChIP-seq in DU145 cells treated with SMARCA2_4.1. Created in BioRender. Yao, X. (2025) https://BioRender.com/x50fdft.
Fig. 5
Fig. 5. Epiregulon identifies drivers of lineage reprogramming.
a Lentiviral constructs used to introduce TFs into LNCaP cells in the reprogram-seq assay. b UMAP representation of 3903 LNCaP cells transduced with virus encoding GATA6, NKX2-1, FOXA1, FOXA2 and mNeonGreen. The cells were infected in individual wells, hashtagged with HTO and then pooled into a single run. c Distribution of HTO tags in each of the clusters. d Motif enrichment in cluster-specific peaks was performed by ArchR using the CisBP motif database. e Chromatin accessibility at neuroendocrine (NE)—and stem cell-like (SCL)-specific regions defined by Tang et al. computed by ChromVAR. f Shown are the distribution of HTO tag assignment for GATA6 and NKX2-1, the level of TF gene expression, chromatin accessibility at GATA6- and NKX2-1 binding sites estimated by ChromVAR and TF activity computed by Epiregulon. g Gene expression (left) and Epiregulon-inferred activity (right) of NKX2-1 and GATA6. h Spearman correlation of regulon weights vs. log-fold changes of putative target genes of GATA6 (left) or NKX2-1 (right) with respect to mNeonGreen-infected cells. Shown is the confidence interval of 95%. P-values are calculated from 2-sided t-test. i Each cell was identified by the HTO tag corresponding to the well receiving virus encoding GATA6 or mNeonGreen. A cell was classified into either expressing GATA6 or not based on its TF activity. AUROC - area under the receiver operating characteristic curve. j Same as i but for NKX2-1. k Number of TFs detected in the GRN computed by the different packages. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Deep neural network (DNN) models validate regulatory elements in regulons.
a We train a DNN model on the ATAC-seq signals from cluster 1 and cluster 3 cells, respectively. We compare the predicted accessibility of either the wildtype sequence or the occluded sequence in the regulatory elements from the regulons inferred by Epiregulon. b Normalized changes in predicted accessibility if we occlude the sequences found in the regulatory elements of GATA6 and NKX2-1 regulons in either the DNN model trained on cluster 1 or cluster 3 cells. The number of regulatory elements for GATA6 is 402, and the number of regulatory elements for NKX2-1 is 134. Boxplots presented as median values ± 25%. Lower whisker is the smallest observation ≥25% quantile −1.5 × interquantile range (IQR). Upper whisker represents the largest observation ≤75%  + 1.5 × IQR. c Each cell was identified by the HTO tag corresponding to the well receiving virus encoding GATA6 or mNeonGreen, and this information served as the true cell labels. For each TF, a cell was classified into either expressing GATA6 or not. GATA6 ChIP-seq was obtained by merging ChIP-seq from ChIP-atlas and ENCODE. ChIP+ motif refers to ChIP-seq peaks that contain GATA6 motifs. We trained a DNN model on cells expressing GATA6 (cluster 1) using Basenji2 and predicted an importance score for each motif based on the difference between the original sequence and the motif occluded sequence. We filtered for those motifs with importance scores higher than the quartiles of scores.
Fig. 7
Fig. 7. Epiregulon predicts known and novel drivers of the cancer state from clinical samples.
ScATAC-seq and scRNA-seq data of primary tumors and normal adjacent tissues were obtained from Terekhanova et al. and paired using Seurat’s label transfer function. Only tumor cells and matching normal cell types were used for GRN construction by Epiregulon (co-occurrence weight method). The top regulators were identified using Epiregulon’s findDifferentialActivity function. Shown are the expression and activity of the top regulators in a renal cell carcinoma, b glioblastoma and c pancreatic adenocarcinoma. Red stars mark regulators, which are known to promote tumor growth, and blue stars mark regulators, which are known to inhibit tumor growth. Created in BioRender. Yao, X. (2025) https://BioRender.com/nn27yx0.

References

    1. Tran, C. et al. Development of a second-generation antiandrogen for treatment of advanced prostate cancer. Science324, 787–790 (2009). - PMC - PubMed
    1. Sakamoto, K. M. et al. Protacs: chimeric molecules that target proteins to the Skp1-Cullin-F box complex for ubiquitination and degradation. Proc. Natl Acad. Sci. USA98, 8554–8559 (2001). - PMC - PubMed
    1. Snyder, L. B. et al. Discovery of ARV-110, a first in class androgen receptor degrading PROTAC for the treatment of men with metastatic castration resistant prostate cancer. Cancer Res81, 43 (2021).
    1. Hagenbeek, T. J. et al. An allosteric pan-TEAD inhibitor blocks oncogenic YAP/TAZ signaling and overcomes KRAS G12C inhibitor resistance. Nat. Cancer4, 812–828 (2023). - PMC - PubMed
    1. Badia, I. M. P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 10.1038/s41576-023-00618-5 (2023). - PubMed

LinkOut - more resources