Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Jan 13;15(1):R15.
doi: 10.1186/gb-2014-15-1-r15.

Differential protein occupancy profiling of the mRNA transcriptome

Comparative Study

Differential protein occupancy profiling of the mRNA transcriptome

Markus Schueler et al. Genome Biol. .

Abstract

Background: RNA-binding proteins (RBPs) mediate mRNA biogenesis, translation and decay. We recently developed an approach to profile transcriptome-wide RBP contacts on polyadenylated transcripts by next-generation sequencing. A comparison of such profiles from different biological conditions has the power to unravel dynamic changes in protein-contacted cis-regulatory mRNA regions without a priori knowledge of the regulatory protein component.

Results: We compared protein occupancy profiles of polyadenylated transcripts in MCF7 and HEK293 cells. Briefly, we developed a bioinformatics workflow to identify differential crosslinking sites in cDNA reads of 4-thiouridine crosslinked polyadenylated RNA samples. We identified 30,000 differential crosslinking sites between MCF7 and HEK293 cells at an estimated false discovery rate of 10%. 73% of all reported differential protein-RNA contact sites cannot be explained by local changes in exon usage as indicated by complementary RNA-seq data. The majority of differentially crosslinked positions are located in 3' UTRs, show distinct secondary-structure characteristics and overlap with binding sites of known RBPs, such as ELAVL1. Importantly, mRNA transcripts with the most significant occupancy changes show elongated mRNA half-lives in MCF7 cells.

Conclusions: We present a global comparison of protein occupancy profiles from different cell types, and provide evidence for altered mRNA metabolism as a result of differential protein-RNA contacts. Additionally, we introduce POPPI, a bioinformatics workflow for the analysis of protein occupancy profiling experiments. Our work demonstrates the value of protein occupancy profiling for assessing cis-regulatory RNA sequence space and its dynamics in growth, development and disease.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Design of protein occupancy profiling experiments and differential occupancy analysis. (A) Schematic representation of the experimental approach of protein occupancy profiling on RNA. Photoreactive ribonucleosides are incorporated into newly synthesized RNA. Protein-RNA complexes are crosslinked with low-energy UV light (365 nm). Crosslinked polyadenylated transcripts are captured by oligo(dT) affinity purification and RNAse I treated. Protein protected RNA fragments are subsequently subjected to small RNA cloning and Illumina sequencing. (B) Overview of the differential T-C transition normalization and statistical testing scheme. For each annotated transcript that passed filtering criteria, initial normalization shifts T-C transition counts for all replicates of the two conditions to the same distributions, thereby removing differences that might arise from variations in sequencing depth or mRNA expression levels of that particular gene (indicated in light blue). Subsequently, a negative binomial testing scheme is used to identify positions with significantly increased or decreased protein occupancy. CDS, coding sequence.
Figure 2
Figure 2
Protein occupancy profiling in MCF7 cells. (A, B) Nucleotide mismatches in read mappings for both MCF7 replicate experiments. From left to right: total number of mapped reads, number of reads with zero mismatches and number of reads with exactly one mismatch followed by the occurrence of individual transitions. A high number of T-C transitions relative to perfect matching reads are indicative of efficient protein-RNA crosslinking. (C, D) Distribution of reads mapping to different RNA types for each individual MCF7 replicate experiment. (E, F) Browser view of the genomic region encoding MYC (E) and the 3' UTR of cyclin D1 (CCND1) mRNA (F). The consensus T-C transition track (in black, number of T-C transitions) and sequence coverage track (orange) of protein occupancy profiles from MCF7 cells are shown on top of each other. PhastCons conservation scores across placental mammals are shown in blue.
Figure 3
Figure 3
Global comparison of protein occupancy profiles and mRNA expression levels in MCF7 and HEK293 cell lines. (A) Heatmap of average pairwise Spearman correlation coefficients of protein occupancy profiles computed for biological MCF7 and HEK293 replicate experiments. The correlation was computed using a sliding window approach to compare read coverage of transcripts between two experiments. The median correlation over all transcripts is shown. (B) Fraction of reads mapping to 5' UTRs, coding sequence (CDS) and 3' UTRs in MCF7 (left) and HEK293 (right) cells averaged over all replicates. Read distributions for protein occupancy profiling experiments are shown on top, while reads from mRNA-seq experiments are depicted at the bottom. (C) Density distribution of T-C transitions from protein occupancy profiling experiments (top) and mRNA-seq read coverage (bottom) averaged over all covered transcript regions. Bold lines represent densities from MCF7 cells. Dashed lines represent densities from HEK293 cells. (D) Smooth scatterplot of gene-wise read abundance changes between MCF7 and HEK293 from protein occupancy profiling (y-axis) and mRNA-seq (x-axis) data. The red line represents the best linear fit. The Pearson correlation coefficient is indicated. It is apparent that RNA-seq data cannot account for the variability in the protein occupancy profiling data.
Figure 4
Figure 4
Analysis of differential crosslinking sites observed in MCF7 versus HEK293 cell lines. (A-C) Browser view of three representative genomic loci encoding differentially occupied transcript regions. Consensus T-C transition profile and read coverage of MCF7 (top) and HEK293 (bottom) are indicated in black and orange, respectively. (A) Dashed red box indicates a position of elevated occupancy in MCF7 versus HEK293 cells in the 3' UTR of the ARID1A transcript. This region coincides with an annotated ELAVL1/HuR binding site previously identified by PAR-CLIP [15]. (B) Region of significantly decreased occupancy in MCF7 versus HEK293 cells in the 3' UTR of CBX3. (C) Genomic loci encoding the long intervening non-coding RNA lincRNA EPHA6-1. Regions with increased protein occupancy in MCF7 cells are apparent (D) Empirical cumulative distribution of the distance to the closest differential T-C transition position (FDR <0.1) for all T-C transitions exhibiting a significant change (red) compared to non-differential positions (black). Differential positions are closer to each other, indicating clustering of differentially occupied sites. (E) Boxplot representing distances between significantly differential positions in MCF7 versus HEK293 cells that change towards the same (gray) or opposing direction (white). Differential positions that share the same orientation are found closer to each other. (F) Fraction of positions with a significant decrease (left) or increase (right) in T-C transitions located in different transcript regions. Elevated positions have a clear tendency to distribute towards the 3' UTR. (G) Density of significantly decreased (top) and increased (bottom) T-C transition positions over relative transcript regions. Decreased T-C transition positions are more frequently observed at the 5' and 3' ends of coding sequences, while up-regulated T-C transition positions do not show a positional tendency.
Figure 5
Figure 5
Comparison of differentially occupied mRNA regions to RNA secondary structure predictions, presence of RNA binding motifs and changes in mRNA half-lives. (A, B) Average positional accessibility around the top 300 positions with significantly increased (A) or decreased (B) T-C transitions in MCF7 versus HEK293. Accessibility reflects the probability of each nucleotide to be unpaired as computed by the LocalFold algorithm [33] averaged over all 300 regions. Accessibility of real positions is indicated in red/blue while results obtained from random regions are indicated in grey. Light grey areas around random accessibilities reflect one standard deviation. We smoothed the data by using a window of ±2 nucleotides. (C, D) RNA binding proteins associated with the 20 most significantly enriched RNAcompete position weight matrices (PWMs) [36] found in a ±25 nucleotide region around positions with increased (C) and decreased (D) T-C transitions. CisBP-RNA database IDs of each PWM are indicated in brackets. The significance level of each PWM is represented by a -log10 transformation of the respective P-value on the left, while the ratio between top differentially occupied and random positions is given in log2-scale on the right. Additional files 6 and 7 contain the full list of significant PWMs. (E) Empirical cumulative density distribution of log2 fold changes in mRNA half-lives between MCF7 and HEK293 cells. The top 300 genes with decreased occupancy are shown in blue while the top 300 genes with increased occupancy are shown in red. Both groups are shifted to longer half-lives in MCF7 relative to the distribution of all other genes (black). We determined the significance levels of both shifts with a one-sided t-test yielding P-values of 0.000898 and 0.00644 for targets harboring positions of increased and decreased occupancy, respectively.

References

    1. Moore MJ. From birth to death: the complex lives of eukaryotic mRNAs. Science. 2005;309:1514–1518. doi: 10.1126/science.1111443. - DOI - PubMed
    1. Baltz AG, Munschauer M, Schwanhausser B, Vasile A, Murakawa Y, Schueler M, Youngs N, Penfold-Brown D, Drew K, Milek M, Wyler E, Bonneau R, Selbach M, Dieterich C, Landthaler M. The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts. Mol Cell. 2012;46:674–690. doi: 10.1016/j.molcel.2012.05.021. - DOI - PubMed
    1. Kwon SC, Yi H, Eichelbaum K, Fohr S, Fischer B, You KT, Castello A, Krijgsveld J, Hentze MW, Kim VN. The RNA-binding protein repertoire of embryonic stem cells. Nat Struct Mol Biol. 2013;20:1122–1130. doi: 10.1038/nsmb.2638. - DOI - PubMed
    1. Auweter SD, Oberstrass FC, Allain FH. Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Res. 2006;34:4943–4959. doi: 10.1093/nar/gkl620. - DOI - PMC - PubMed
    1. Cook KB, Kazan H, Zuberi K, Morris Q, Hughes TR. RBPDB: a database of RNA-binding specificities. Nucleic Acids Res. 2011;39:D301–D308. doi: 10.1093/nar/gkq1069. - DOI - PMC - PubMed

Publication types