Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 9;15(1):7872.
doi: 10.1038/s41467-024-52215-7.

Systematic identification of post-transcriptional regulatory modules

Affiliations

Systematic identification of post-transcriptional regulatory modules

Matvei Khoroshkin et al. Nat Commun. .

Erratum in

  • Author Correction: Systematic identification of post-transcriptional regulatory modules.
    Khoroshkin M, Buyan A, Dodel M, Navickas A, Yu J, Trejo F, Doty A, Baratam R, Zhou S, Lee SB, Joshi T, Garcia K, Choi B, Miglani S, Subramanyam V, Modi H, Carpenter C, Markett D, Corces MR, Mardakheh FK, Kulakovskiy IV, Goodarzi H. Khoroshkin M, et al. Nat Commun. 2024 Oct 28;15(1):9277. doi: 10.1038/s41467-024-52903-4. Nat Commun. 2024. PMID: 39468017 Free PMC article. No abstract available.

Abstract

In our cells, a limited number of RNA binding proteins (RBPs) are responsible for all aspects of RNA metabolism across the entire transcriptome. To accomplish this, RBPs form regulatory units that act on specific target regulons. However, the landscape of RBP combinatorial interactions remains poorly explored. Here, we perform a systematic annotation of RBP combinatorial interactions via multimodal data integration. We build a large-scale map of RBP protein neighborhoods by generating in vivo proximity-dependent biotinylation datasets of 50 human RBPs. In parallel, we use CRISPR interference with single-cell readout to capture transcriptomic changes upon RBP knockdowns. By combining these physical and functional interaction readouts, along with the atlas of RBP mRNA targets from eCLIP assays, we generate an integrated map of functional RBP interactions. We then use this map to match RBPs to their context-specific functions and validate the predicted functions biochemically for four RBPs. This study provides a detailed map of RBP interactions and deconvolves them into distinct regulatory modules with annotated functions and target regulons. This multimodal and integrative framework provides a principled approach for studying post-transcriptional regulatory processes and enriches our understanding of their underlying mechanisms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Workflow overview: generating an integrated regulatory interaction map of RNA-binding proteins.
AC The results of BioID2, Perturb-seq, and publicly available ENCODE eCLIP assays were independently processed and normalized across RBPs. D The resulting Z-scores were used to estimate the cosine distance between all pairs of the tested RBPs and to calculate empirical left-tailed p-values for RBP-RBP similarities. For each pair of RBPs, the p-values from three assays were aggregated as in ref. to obtain a single measure of similarity between RBPs across the feature spaces from the three modalities. The resulting matrix of pairwise similarities was defined as the Integrated Regulatory Interaction Map (IRIM) that simultaneously captures physical and functional interactions between RBPs.
Fig. 2
Fig. 2. Unveiling post-transcriptional regulatory modules through integrative analysis of RBP-RBP interactions.
A Integrated Regulatory Interaction Map (IRIM): This heatmap displays integrated distances between RBPs, where each cell’s color denotes the integrated distance between the corresponding RBPs. Hierarchical clustering is illustrated by the dendrogram to the left. The colormap signifies the inclusion of RBPs in three data sources: eCLIP (green), BioID (blue), and Perturb-seq (brown). Recognized regulatory modules are emphasized in red with contributing RBPs labeled directly on the plot. Insets present detailed heatmaps for two exemplary modules, colored respectively for source datasets: BioID (blue), Perturb-seq (orange), and eCLIP (green). Proteins discussed are highlighted in red, and examples of module interplay, including U2AF1 and KHSRP, are marked in orange. Source data are provided as a Source Data file. B Swarm Plots for RBP Partners of XRCC6, PPIL4, and LIN28B: Swarm plots illustrate the RBP partners for XRCC6 (top), PPIL4 (middle), and LIN28B (bottom), with each point representing an individual RBP. The points are organized by the integrated distance from the specified RBP to the query RBP. Annotations within each plot designate the common function of the closest interacting partners. The three RBPs with the smallest distances are specifically labeled; those associated with a common function are marked in purple, and the others in gray. Source data are provided as a Source Data file. C Identification of RBP Partners of ZNF800 and TAF15: The swarm plots here delineate the RBP partners of ZNF800 (left) and TAF15 (right), employing the same color-coding for datasets as in (A): eCLIP (green), BioID (blue), and Perturb-seq (brown). The top portion represents the RBP partners as derived from individual datasets, each annotated with the common function of the nearest interacting partners. The bottom portion, analogous to (B), displays the RBP partners sorted by the integrated distance, with the top interacting RBPs distinctly labeled according to the common function in purple and the others in gray. Source data are provided as a Source Data file. D Examination of RBP Partners of DGCR8: This section presents swarm plots of the RBP partners of DGCR8. The top plots showcase the partners based on individual source datasets, similar to (C), with each plot annotated and color-coded according to (A). The bottom plot displays the RBP partners sorted by integrated distance, highlighting the top interacting RBPs. Notably, TAF15 and XRN2 are emphasized, illustrating the efficacy of the distance integration procedure in confirming the known involvement of DGCR8 in the regulation of transcription. Source data are provided as a Source Data file. E Rearrangements in RBP matrices: This panel demonstrates the alterations in the structure of the Integrated Regulatory Interaction Map matrix due to random shuffling, depicting changes in distance to the closest and farthest partner RBP. Downsampling was conducted by shuffling distance values of varying fractions of RBPs (0% to 100%). This procedure was performed 10 times for each of 90 RBP, resulting in 900 estimates for each dataset and shuffling percent. Dots represent the median, error bars represent the lower and upper quartiles. Source data are provided as a Source Data file. F Percent of RBP pairs passing IRIM distance < 25% quantile that intersect STRING, OpenCell, hu.Map, and Zanzoni et al.. Violin and boxplots are based on 104 random shuffling iterations; red dots represent the percent of the real IRIM distances. Right-tailed p-values were obtained for each group by calculating a fraction of random shuffling iterations with the intersection greater or equal to the observed value (among 104 + 1 cases). Box plot bounds and center represent the first, second, and third quartiles, while whiskers represent minimum and maximum values in the data, excluding outliers that are more than 1.5 interquartile range from lower and upper quartiles and are depicted as dots. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. BioID-mediated proximity labeling defines RBP neighborhoods and enables functional annotation of RBPs.
A Overview of our pathway annotation workflow for RBPs. The example provided shows the test for the association of ZNF800 and GO:0006361 (transcription initiation from RNA polymerase I promoter). Proximity-labeled proteins were ranked by their z-scores in the ZNF800-BioID dataset, where a higher score implies enrichment relative to control. Experiments were performed in biological triplicates using unlabeled samples as controls (three cases vs. three control designs). Gene-set enrichment analyses were performed on the resulting ranked list across all RBPs. Each enrichment analysis resulted in a p-value and NES score for a given pair of RBP and a pathway. B A heatmap showing the associations between RBPs and pathways as inferred from proximity labeling data. Columns correspond to the RBPs, rows correspond to individual gene ontology terms (Biological Processes; BP), and the color denotes the GSEA normalized enrichment score (NES). The associations showing FDR < 0.05 are marked with a yellow asterisk. The green heatmap in the header shows the RBP binding preferences to particular RNA types, as determined based on eCLIP RNA targets. Some known functions of RBPs are highlighted by boxes and zoomed-in on the right. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. ZC3H11A and TAF15 control multiple independent regulons through distinct regulatory programs.
A Violin plots showing the normalized enrichment scores (NES) resulting from gene set enrichment analysis of proximity labeling data. Left subpanel: NES scores across all the GO-BP terms for ZC3H11A and TAF15 proteins. The five highest-scoring pathways are highlighted with color. Right subpanel: NES scores across all studied RBPs for the pathways GO:0000387, GO:0045727, and GO:0048255. ZC3H11A and TAF15 are highlighted with colored triangles. Dashed lines: quartiles; solid red line: 0.9 quantile. B Scatterplot showing changes in alternative splicing events (ASE) usage upon ZC3H11A knockdown as estimated by MISO. Individual subplots cover different classes of alternative splicing events: Skipped Exon (SE), Retained Intron (RI), Alternative 3’ Splice Site (A3SS), Alternative 5’ Splice Site (A5SS), and Mutually Exclusive Exon (MXE). Dashed lines indicate the following filters: Bayes factor ≥ 10 and the absolute value of isoforms levels difference ≥ 0.2. The ASEs passing these filters are shown in red. Source data are provided as a Source Data file. C Relative levels of two skipped exons from the transcripts WARS1 (left) and ASPM (right) were measured by RT-qPCR in control K562 and ZC3H11A-KD cells; n = 3 biological replicates. P-value from a one-sided t test performed on log-transformed isoform expression ratios, 0.0166 for WARS1 and 2.86·10−4 for ASPM. Source data are provided as a Source Data file. D Scatterplot showing changes in alternative splicing events in TAF15 knockdown cells, as in (B). Source data are provided as a Source Data file. E Relative levels of two retained introns from the transcripts CDC37 (left) and ZWINT (right) were measured by RT-qPCR in control K562 and ZC3H11A-KD cells; n = 3 biological replicates. P-value from one-sided t test performed on log-transformed isoform expression ratios, 8.03·10−5 for CDC37 and 2.883·10−4 for ZWINT. Source data are provided as a Source Data file. F Left: Sashimi plot illustrating the changes in intron retention event usage in ZWINT transcript upon TAF15 knockdown. Right: Genomic view of the ZWINT retained intron, RNA-seq profiles from WT and TAF15-KD cells, and TAF15 CLIP-seq peaks are shown at the bottom. Y-axis: counts per million (CPM). The region corresponding to the alternative splicing event is framed.
Fig. 5
Fig. 5. TAF15 is directly involved in RNA translation and stability regulation.
A Left: enrichment analysis of TAF15 mRNA targets among the differentially translated genes (in the TAF15-KD cell line compared to the WT cell line). The differential ribosome occupancy (RO) measurements in TAF15-KD cells were estimated from Ribo-seq. The genes were sorted based on the RO change (along the x-axis), and the enrichment of TAF15 mRNA targets, inferred from eCLIP data, was calculated using iPAGE (top subpanel) and with GSEA (bottom subpanel, ES stands for the enrichment score). Two example targets, HMGB2 and RPL35, are highlighted. Right: levels of HMGB2 and RPL35 were measured by mass spectrometry in control K562 and TAF15-KD cells. N = 5 biological replicates. P-value from one-sided Wilcoxon rank sum test, 0.04762 for both HMGB2 and RPL35. Source data are provided as a Source Data file. B Genomic view of HMGB2 (left) and RPL35 (right). RNA-seq and Ribo-seq WT and TAF15-KD profiles, as well as TAF15 CLIP-seq peaks, are shown below. Y-axis: counts per million (CPM). C Left: enrichment analysis of TAF15 mRNA targets among the differentially stabilized transcripts (in TAF15-KD cell line compared to WT cell line) measured by α-amanitin treatment. The transcripts were sorted based on stability change (log2FCs). The enrichment of TAF15 RNA targets, inferred from eCLIP data, was calculated with iPAGE (top and middle subpanel) and with GSEA (bottom subpanel). Two example targets, UBE2J2 and GUK1, are highlighted. Right: relative stability of UBE2J2 and GUK1 mRNAs were measured as mRNA to pre-mRNA abundances ratio using qPCR in control K562 and TAF15-KD cells. N = 4 biological replicates. P-value from one-sided Wilcoxon rank sum test, 0.01429 for UBE2J2 and 0.0147 for GUK1. Source data are provided as a Source Data file. D Genomic view of UBE2J2 (top) and GUK1 (bottom). RNA-seq WT and TAF15-KD profiles, as well as TAF15 CLIP-seq peaks, are shown below. Y-axis: counts per million (CPM). E Venn diagram of TAF15 RNA regulons. Shown are the numbers of genes that exhibit significant changes in splicing (155 genes with Bayes factor ≥ 10), translation (919 genes with FDR < 0.05), or stability (2068 genes with FDR < 0.05) upon TAF15 knockdown, as captured by RNA-seq, Ribo-seq, and RNA-seq with α-amanitin, respectively. Results of one-sided Fisher’s exact test for each pairwise intersection were FDR-corrected for multiple testing and are shown next to the corresponding area. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. ZNF800 and QKI control gene expression at transcriptional and post-transcriptional levels independently.
A Violin plot showing the normalized enrichment scores (NES) resulting from gene set enrichment analysis of proximity labeling data. Left subpanel: NES scores across all the GO-BP terms for ZNF800 and QKI proteins. The 5 highest scoring pathways are highlighted with color. Right subpanel: NES scores across all the studied RBPs for GO:0016571, GO:0016575, and GO:0042254 GO terms. ZNF800 and QKI are highlighted with colored triangles. Dashed lines: quartiles; solid red line: 0.9 quantile. B Volcano plots showing differential chromatin accessibility between WT K562 cells and ZNF800-KD (left) or QKI-KD (right) cells. Each point denotes a single ATAC-seq peak; peaks passing 0.1 FDR are colored red. The distribution of peaks among various genomic regions is shown on the right of each volcano plot. Source data are provided as a Source Data file. C Genomic view of RPS15 (left) and LTBR (right) promoter regions. ATAC-seq profiles of WT cells along with ZNF800-KD (left) or QKI-KD (right) are shown. The Binding of ZNF800 to the RPS15 promoter region and the binding of QKI to the LTBR promoter region were measured by ChIP-qPCR in K562 cells and are illustrated on the right of each profile plot. Source data are provided as a Source Data file. D Box plots showing the distributions of expression fold changes in WT cells compared to either ZNF800-KD cells (left) or QKI-KD cells (right), as measured by RNA-seq. The distributions for the genes showing significant promoter accessibility increase upon the respective knockdown and for the rest of the genes are shown separately. The top most highly accessible ATAC-seq peak was considered for each gene resulting in 21708 genes in both ZNF800-KD and QKI-KD cells, of which 834 (3.8%) and 1476 (6.8%) had their promoters accessibility increased upon ZNF800 and QKI knockdown, respectively. P-value calculated by one-sided Wilcoxon rank sum test, 8.1·10−14 for ZNF800-KD and 2.64·10−6 for QKI-KD. Box plot bounds and center represent the first, second, and third quartiles, while whiskers represent minimum and maximum values in the data, excluding outliers that are more than 1.5 interquartile range from lower and upper quartiles and are depicted as dots. Source data are provided as a Source Data file. E Box plots depicted as in (D) showing the distributions of chromatin accessibility fold changes in WT cells compared to either ZNF800-KD cells (left) or QKI-KD cells (right), as measured by ATAC-seq. The distributions for ZNF800- or QKI- RNA targets (as defined by eCLIP) and the rest of the genes are shown separately. In total, there are 23275 ATAC-seq peaks, with 714 assigned to ZNF800 RNA target genes (leaving 22561 as non-target) and 286 assigned to QKI RNA target genes (leaving 22989 as non-target). P-value calculated by one-sided Wilcoxon rank sum test, 6.81·10−20 for ZNF800-KD and 2.3·10−7 for QKI-KD. Source data are provided as a Source Data file.

References

    1. Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet.15, 829–845 (2014). - PMC - PubMed
    1. Keene, J. D. RNA regulons: coordination of post-transcriptional events. Nat. Rev. Genet.8, 533–543 (2007). - PubMed
    1. Hogan, D. J., Riordan, D. P., Gerber, A. P., Herschlag, D. & Brown, P. O. Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system. PLoS Biol.6, e255 (2008). - PMC - PubMed
    1. Imig, J., Kanitz, A. & Gerber, A. P. RNA regulons and the RNA-protein interaction network. Biomol. Concepts3, 403–414 (2012). - PubMed
    1. Cho, N. H. et al. OpenCell: Endogenous tagging for the cartography of human cellular organization. Science375, eabi6983 (2022). - PMC - PubMed

Publication types

Associated data