Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 23;174(5):1067-1081.e17.
doi: 10.1016/j.cell.2018.07.001. Epub 2018 Aug 2.

Heteromeric RNP Assembly at LINEs Controls Lineage-Specific RNA Processing

Affiliations

Heteromeric RNP Assembly at LINEs Controls Lineage-Specific RNA Processing

Jan Attig et al. Cell. .

Abstract

Long mammalian introns make it challenging for the RNA processing machinery to identify exons accurately. We find that LINE-derived sequences (LINEs) contribute to this selection by recruiting dozens of RNA-binding proteins (RBPs) to introns. This includes MATR3, which promotes binding of PTBP1 to multivalent binding sites within LINEs. Both RBPs repress splicing and 3' end processing within and around LINEs. Notably, repressive RBPs preferentially bind to evolutionarily young LINEs, which are located far from exons. These RBPs insulate the LINEs and the surrounding intronic regions from RNA processing. Upon evolutionary divergence, changes in RNA motifs within LINEs lead to gradual loss of their insulation. Hence, older LINEs are located closer to exons, are a common source of tissue-specific exons, and increasingly bind to RBPs that enhance RNA processing. Thus, LINEs are hubs for the assembly of repressive RBPs and also contribute to the evolution of new, lineage-specific transcripts in mammals. VIDEO ABSTRACT.

Keywords: CLIP; LINE repeats; MATR3; PTBP1; alternative polyadenylation; cryptic exons; evolution; exonogenesis; multivalency; splicing.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
LINEs Are Binding Platforms for Diverse RBPs (A) Number of LINE fragments within introns of human genes based on UCSC annotation (hg19 assembly), the number of LINEs with a 3′ or 5′ splice sites and the number of LINEs forming an exon. The total number of exonized elements is given, which includes elements contributing a poly(A) termination site to a terminal exon in addition to those contributing a 3′ or 5′ splice site. (B) Estimate of abundance of L1-sequences in subcellular RNA fractions from HeLa, K562, and HepG2 cells. Strand-specific RNA-seq was used to quantify abundance of L1 in sense and antisense (orange and blue), relative to the number of mapped reads. Data is split for libraries made from polyA−, polyA+, or rRNA− RNA. Data for K562 and HepG2 is from the ENCODE consortium. Data for HeLa is from triplicates and is shown as mean ± SD. cyt, cytoplasmic RNA; nuc, nuclear RNA; chrom, chromatin-associated RNA. (C) Frequency of L1 repeat sequences among the bound RNA sequences of a panel of RBPs. Because e/iCLIP is strand-specific, binding to LINEs transcribed in sense or in antisense was quantified separately (orange and blue). Orange and blue lines indicate median binding across all RBPs. The inlet indicates the section of the full dataset shown, the full dataset including sources is available in Table S2. For visualization, replicates were averaged and only data from one cell line is shown. (D) Binding to introns of at least 7 kb size was analyzed in 100-nt bins up to 5 kb upstream and downstream of the exon and quantified in percent relative to the total number of mapped reads. Data is shown for the first 100-nt bin and as an average of the 100-nt windows within 101–500 nt, 501–2,000 nt, and 2,001–5,000 nt distance. A rank for deep intronic binding is given based on the average of the first 100 nt of either splice site and average binding in the 2,001- to 5,000-nt window. See also Figure S2 and Table S2.
Figure S1
Figure S1
Extended Data for LINEs Are Binding Platforms for a set of RBPs, Related to Figure 1 TEtranscript (Jin et al., 2015) was used to estimate the enrichment of each subfamily of L1 and L2 repeats among the bound RNA sequences of a panel of RBPs, comparing the abundance in recovered eCLIP tags to the abundance in RNaseq reads. For each RBP, all 142 L1/L2 subfamilies (132 for L1, 10 for L2) were considered. Since eCLIP is strand-specific, binding to LINEs transcribed in sense or in antisense were quantified separately, colored in red and blue. The cell lines used in each eCLIP experiment are indicated on the bottom.
Figure S2
Figure S2
Combinatorial Binding of MATR3 and PTBP1 to the Same LINEs, Related to Figure 2 (A) For each RBP that showed considerable binding to LINE repeats in iCLIP (see B), we selected the 50 LINE repeats with strongest coverage (cDNAs per 100nt). For comparison we included TARDBP, which showed little binding to LINE repeats. All iCLIP data selected was collected from HEK293 cells. The heatmap shows comparison of binding strength at this set of 214 LINE repeats, and the nearest neighbor analysis for each RBP. The values left to the dendrogram show the Pearson correlation coefficient between all RBPs and PTBP1. Only LINEs with a minimal length of 50nt were considered to reduce the bias to short, highly expressed LINE repeats. (B) Metaprofile of iCLIP binding for MATR3 around iCLIP binding peaks of PTBP1 within and outside of LINE repeats. The data was smoothed with 20nt bins. (C) HEK293T cells were transfected with siRNAs targeting MATR3, PTBP1 or scrambled controls, and 72 hours later labeled with 100μM 4SU for 8 hours and cross-linked with 365nm UV light. The radiogram shows 32P labeled RNA crosslinked to and co-precipitated with PTBP1. Before immunoprecipitation, protein concentration was measured and equalised. The PTBP1 iCLIP was done under low RNase conditions (compare with Figure 2A for high RNase condition). Replicate 1 and 2 are independent biological replicates processed in parallel. (D) 32P labeled RNA crosslinked to and co-precipitated with MATR3 under equivalent conditions as in (C). The MATR3 iCLIP shown was done under high RNase conditions. (E) MATR3 binding peaks were identified from iCLIP experiments, and classified according to susceptibility to PTBP1 depletion as indicated based on moderated log2 fold change. Binding peaks with a normalized count of less than 8 were ignored, as indicated by the dotted line. (F) The overlap between the center of MATR3 binding peaks and different repeat classes was tested for antisense L1 elements, sense L2 elements, and sense CT-/T-rich microsatellite repeats. Metaprofiles show the percentage of each class of clusters overlapping with each genomic element, and PTBP1-dependent and –independent MATR3 binding peaks are color-coded as in (E). (G) Protein-protein interaction between MATR3 and PTBP1 allows recruitment of PTBP1 to a MATR3 bound RNA in vitro. Recombinant MATR3 mutants (rMATR3) and 32P labeled RNA probes were added to nuclear extracts from HeLa cells and UV-crosslinked. RNA substrates contained either two MATR3 or six PTBP1 RNA compete motifs motifs (ATCTT2 and CTCTT6). Crosslinking signals corresponding to endogenous PTBP1 (PTBP1) and MATR3 (eMATR3) were confirmed by immunoprecipitation.
Figure 2
Figure 2
Binding of PTBP1 to Antisense L1 Elements Is MATR3-Dependent PTBP1 iCLIP was performed from HEK293T cells depleted of MATR3 as well as controls. MATR3-dependent PTBP1 binding clusters are shown in red and MATR3-independent PTBP1 binding clusters in blue (C–F). (A) RNA crosslinked to and co-precipitated with PTBP1 under high RNase conditions was labeled with 32P-ATP; the size of the PTBP1-RNA is marked next to the radiogram gel image. The input lysate for the iCLIP experiment was probed for MATR3 and PTBP1 antibodies in a western blot. The gel image was cut to align it with the radiogram. Replicates are shown in Figure S2A, and Figure S3C shows another western blot assessing MATR3 and PTBP1 protein levels in the relevant conditions. (B) To quantify the signal, gray pixel intensity measured across the center of each lane is shown, analyzed with ImageJ software. (C) PTBP1 binding peaks were identified from all iCLIP experiments and classified according to their susceptibility to MATR3 depletion. Binding peaks with a normalized count of <8 were ignored, indicated by the dotted line. (D) Coverage of MATR3 iCLIP around MATR3-dependent PTBP1 binding peaks. (E) Enrichment for high-affinity PTBP1 binding motifs around PTBP1 binding peaks. Left: all PTBP1 binding peaks show strong enrichment for PTBP binding motifs. Right: MATR3-dependent PTBP1 binding peaks show enrichment in a 200-nt region for high-affinity motifs above other PTBP1 binding peaks. (F) The overlap between the center of PTBP1 binding peaks and different repeat classes was tested for antisense L1 elements, sense L2 elements, and sense CT-/T-rich microsatellite repeats. Metaprofiles show the percentage of each class of clusters overlapping with each genomic element. (G) Protein-protein interactions between MATR3 and PTBP1 allow the formation of a heteromeric complex on a substrate RNA with two ATGTT motifs in vitro. Recombinant PTBP1 (rPTBP1) and different MATR3 mutants (rMATR3) were crosslinked to the same RNA at different MATR3 molarity (rPTBP1 at 0.5 μM).
Figure 3
Figure 3
MATR3 and PTBP1 Repress Splice and Poly(A) Sites in LINEs (A) The metadata profile shows the coverage of antisense L1 sequences in a ±2 kb window flanking the splice sites and the proximal and distal poly(A) sites of MATR3/PTBP1/2 repressed events or control. Metadata profile was smoothed using 40-nt bins. (B) LINE-derived exons were identified de novo from RNA-seq data of HeLa cells depleted of MATR3 and PTBP1. Differences in exon inclusion across groups were tested by Kruskal-Wallis rank-sum test (p value < 2.2e−16) and pairwise comparisons by Dunn’s test corrected according to Holm-Šidák. ∗∗∗Adjusted p value < 0.001 in all indicated comparisons. LINE-derived exons specific to the MATR3/PTBP1 depleted condition were of too low read count for quantification in the other conditions. (C) Metadata profiles of MATR3 and PTBP1 iCLIP binding across ±2 kb of the splice site of LINE-derived exons shown in (B). iCLIP binding is presented as a percentage of occupancy, and was smoothed using 40-nt bins. Occupancy on non-regulated sites is shown in gray as control. (D) Percent change in the use of the proximal poly(A) sites. poly(A) sites are split into those within 2 kb vicinity of a LINE and those that are not. (E) Metadata profiles of MATR3 and PTBP1 iCLIP binding as in (C) across ±2 kb of the poly(A) sites shown in (D). See also Figures S3 and S4 and Tables S3 and S4.
Figure S3
Figure S3
Features of LINE Elements Repressed by MATR3 and PTBP1, Related to Figure 3 (A) Established alternative exons derived from or within 750nt to a LINE are more strongly repressed by MATR3 than those that are further away. The differences in repression strength across groups was tested by Kruskal-Wallis Rank Sum test (across all four conditions p value = 0.0193; comparison as indicated p value = 0.00335). (B) Semiquantitative western blot showed efficient depletion of MATR3 and PTBP1 in cells transfected with siRNAs against MATR3 or PTBP1/2 individually or in combination. (C) The class and orientation of the LINEs that seed exons repressed by MATR3/PTBP1. (D) Percent exon inclusion estimates of LINE-derived exons in unperturbed HeLa cells. Exons are grouped as in Figure 3B. (E) MATR3/PTBP1 repressed LINE-derived exons are within long introns. Intron size is the total distance between the flanking exons. The gray line indicates an intron length of 2kb.
Figure S4
Figure S4
Emergence of New Termination Sites following MATR3/PTBP1 Depletion, Related to Figure 3 Examples of MATR3/PTBP1 repressed poly(A) sites. Genome browser tracks show position and orientation of LINE insertion (hg19/RepeatMasker annotation), PTBP1 and MATR3 iCLIP coverage, as well as tracks for RNaseq of cytoplasmic RNA and mRNA 3′ end sequencing (pA-seq) from total RNA. All tracks are scaled appropriately to library size. (A) The MROH1 gene shows inclusion of additional exonic sequence and two different terminal exon isoforms in MATR3 depleted cells (highlighted by red dashed lines). Inclusion of this alternative terminal exon appears to cause premature transcriptional termination, as seen by loss of expression downstream of the exon (highlighted by orange dashed lines). (B) Use of a cryptic processing site in the PIGN1 results in a new exon and a new poly(A) site, derived from two antisense L1 insertions (highlighted by red dashed lines).
Figure 4
Figure 4
Partial Deletion of L2 Sequences Disrupts Splicing Repression of ACAD9 by MATR3/PTBP1 (A) Schematic illustrating the endogenous ACAD9 locus and the ACAD9 splice reporter. The first two exons and the complete intron1 were cloned into a CMV-driven reporter plasmid. In the ΔLINE splice reporter, 499 bp of L2 sequence were replaced by non-repetitive sequence of intron2 of ACAD9. Arrows indicate positions of primers used for isoform detection in RT-PCR. (B) The inclusion level of the LINE-proximal alternative exon in endogenous ACAD9 was measured in total RNA of cells depleted of MATR3 and PTBP1/2 individually or in combination. (C) The inclusion level of the LINE-derived exon was measured as in (B) in the wild-type and ΔLINE ACAD9 splice reporter. (B and C) To test for significance, one-way ANOVA was used coupled with multiple comparison correction according to Tukey’s HSD. ∗∗∗p value below 0.001. Semiquantitative RT-PCR analysis is averaged across three independent replicates, error bars indicate SD. Additional splice products are indicated by asterisks; these include a longer form of exon1 with an alternative 5′ splice site (exon 1b). For simplicity, only the relevant isoforms are quantified. See also Figure S5.
Figure S5
Figure S5
Depletion of ACAD9 Expression following Inclusion of a LINE-Derived Exons, Related to Figure 4 (A) Genome browser tracks for PTBP1 and MATR3 iCLIP data from HeLa cells at the ACAD9 locus relative to binding motifs of PTBP1 and MATR3. Multivalency of PTBP1 binding sites is indicated as percent of nucleotides that are part of a binding motif within 250 nucleotide windows. Below, the structure of annotated ACAD9 transcripts is annotated as well as the position of the 3′ splice site of the cryptic exon repressed by MATR3/PTBP1 and the position of L2 element fragments. (B) Stranded RNaseq data from cytoplasmic RNA of HeLa cells depleted of MATR3 and PTBP1/2 is shown. Below the position of a new pA site within the second L2 repeat is shown, which is only detected in absence of MATR3/PTBP1/2. (C) Quantification of ACAD9 expression in single and combined depletion of MATR3 and PTBP1/2 from cytoplasmic RNaseq. (D) Genome browser tracks for PTBP2 and MATR3 on the mouse Acad9 locus. In mouse, there is a single, 465bp long L2 insertion annotated.
Figure S6
Figure S6
L1-Derived Exons Are a Source of Primate-Specific Alternative Exons with High Tissue Specificity, Related to Figure 5 Percent splice index (PSI) was calculated in the GTEx panel of human tissues for LINE-derived and Alu-derived exons, as well as all other exons of the same genes. All exons are annotated within UCSC and cross-referenced with RefSeq annotation. Inclusion levels range from 0 to 100%, showing no inclusion or full inclusion. If no support for expression of the flanking exons was found, the gene is assumed to be non-expressed. The number of exons in each group is indicated at the bottom of each boxplot. Genomic age of L1 elements as defined and color-coded in Figure 5A. Significance tests were done across groups by Kruskal-Wallis’ test and pairwise comparisons were corrected according to Siegel-Castellan. ∗∗ and ∗∗∗ indicate adjusted p value was below 0.01 and 0.001, respectively. (C-E, G): Groups are color coded as indicated in the legend on the right of panel D. (A) For all exons surveyed within the GTEx data, the difference in PSI between the tissues with highest and lowest inclusion was calculated as metric for tissue-specific inclusion. (B) For all exons surveyed within the GTEx data, the difference in PSI between the tissues with highest and lowest inclusion was calculated as metric for tissue-specific inclusion. (C) The substitutions from L1 consensus families is shown for L1s grouped by phylogenetic age. As expected, young elements show fewer substitutions from consensus then old elements. (D) Difference in PSI between tissues with highest and lowest inclusion for exons derived from L1 elements grouped by genomic age of the insertion, compared to exons derived from L2 and CR1 insertions. (E) The number of L1-derived exons is shown for all primary tissues screened in the GTEx data, based on testing in which tissue an exon is most included. Exons are allowed to be counted multiple times if maximum inclusion was in multiple tissues, for instance because they are constitutive. (F) UCSC annotated L1-derived exons are within long introns. Intron size is the total distance between the flanking exons. The gray line indicates an intron length of 2kb. (G) Exons derived from L1 elements have strong splice sites irrespective of the genomic age of the insertion. The maximum entropy score of 5′ and 3′ splice sites of each exon was predicted based on nucleotide sequence (Yeo and Burge, 2004).
Figure 5
Figure 5
Evolutionarily Old LINEs Are a Source of Lineage-Specific Alternative Exons (A) The phylogenetic age of each LINE fragment in the human genome was mapped by comparison to the gorilla, rhesus macaque, mouse, rat, dog, and cow genome assemblies using UCSC liftover genome alignments overlaid with RepeatMasker annotation. Elements specific to the primate or euarchontoglires lineage are considered evolutionarily young elements, while elements present in cow and dog are considered old elements. Phylogenetic groups are color-coded and used in analysis (B–E). (B) Percentage of UCSC annotated exons derived from phylogenetic groups as defined in (A). Exons are generally not derived from the youngest L1 elements. (C) Exons derived from evolutionarily young L1 elements are rarely used across many tissue subtypes in human. Percent spliced index (PSI) was calculated in the GTEx panel of human tissue samples for LINE-derived exons annotated in UCSC. We determined the number of tissues in which each exon was detectable at PSI >5% and compared repeat-derived exons to non-repeat derived alternative exons. (D) Maximum inclusion in any tissue correlates with the genomic age of L1-derived exons. Significance was tested across groups by Kruskal-Wallis’ rank-sum test. The number of exons in each group is indicated at the bottom; adjusted p values below 0.05, ∗∗∗adjusted p values below 0.001. (E) Density profiles showing L1 antisense sequence 5 kb upstream and downstream of human exons. L1s were split for evolutionary young and old insertions and repeat density is normalized to the total number of repeats in the two groups. For comparison, the primate-specific Alu insertions are shown. Exons were grouped by inclusion in human tissues into those that are >5% but on average <15% included in any tissue, those which are alternative, and those which are constitutively included. To better present the repeat density around the splice sites, the x axis is cut at 250 nt to show a zoom-in of the 250 nt flanking the exons. øPSI, average PSI across 51 tissues. See also Figure S6 and Tables S5 and S6.
Figure S7
Figure S7
Murine MATR3 and PTBP1 Bind to Mouse-Specific L1 Insertions, Related to Figure 6 (A) Density profiles showing L1 antisense sequence 5kb upstream and downstream of constitutive and alternative exons in the mouse. The genomic age of each L1 element in the mouse genome was mapped by comparison to the rat, rhesus macaque, human, dog and cow genome assemblies. For comparison, the rodent-specific B2 repeat insertions are shown. (B) TEtranscript (Jin et al., 2015) was used to estimate the enrichment of each subfamily of L1 and L2 repeats among the bound RNA sequences of a panel of RBPs, with CLIP data available for C57Bl mouse brain; comparing the abundance in recovered eCLIP tags to the abundance in RNaseq reads of ENCODE sequencing data of mice at P2. For each RBP, 133 repBase LINE subfamilies were considered (129 for L1, 4 for L2) (Jurka, 1998). Families were grouped depending on if they emerged in eutheria or only in rodents, based on the information available on repBase. Since eCLIP is strand-specific, binding to LINEs transcribed in sense or in antisense was quantified separately, colored in red and blue. Details and references of datasets are given in Table S1. Differences between rodent-specific and mammalian/eutherian L1 families were tested by two-sided t test and corrected for multiple testing according to Bonferroni.
Figure 6
Figure 6
Young L1 Elements Are Rich in Splice Repressor Binding Motifs that Are Lost in Evolutionarily Older Element (A) RBPs show preferences for binding to L1 elements of different evolutionary ages. The L1 elements with 10% highest coverage across any i/eCLIP data were used to calculate a relative binding estimate for each RBP ranging from 0 to 1, and for visualization of binding preference, the enrichment of each RBP was normalized to its mean. The number of L1 elements considered in each cell line is given at the bottom. RBPs considered splice-repressive are underlined in red, and components of the RNA processing machineries in green. (B) Cumulative distribution function of gain or loss of exonic splice enhancer (ESS) and intronic splice silencers sequences (ISS). All hexamer sequences were ranked by their enrichment in evolutionarily young compared to old LINEs. (C) Antisense L1 sequences with known binding motifs for relevant RBPs, and the percentage of evolutionarily young versus old elements among them and the percent of deep intronic versus exon-proximal elements. The dotted line indicates the expected proportion. RBPs with multivalent binding sites are marked with one or two red dots, if 10% and 20% of the 100-nt window were part of the motif, respectively. We used the top 10% of L1 sequences with the highest density of binding motifs within a 100-nt window. (D) The position of splice sites of L1-derived exons across the L1 sequence. For reference, the structure of the L1PA family of L1s is given on top. Only splice sites in antisense L1 elements are shown. (E) The position of RBP binding motifs within the antisense L1PA family consensus sequence in green. On top of the track with each RBP’s binding motifs, coverage in e/iCLIP binding data is shown. (F) Alignment of antisense L1 insertions against L1 consensus sequences. We selected deep intronic insertions (shown in blue) and exon-proximal insertions (in orange) and aligned them against three consensus families, only keeping the best alignment for each genomic insertion. See also Figure S7 and Table S7.
Figure 7
Figure 7
Evolution of LINEs from RNA Insulation to a Template for New Exons Consensus L1 elements contain strong putative splice sites, but exonization is rare. Evolutionarily young L1s recruit a number of splice repressive proteins, including MATR3, PTBP1, and HNRNPM, as well as RBPs of yet unknown function (indicated by X; including BCCIP and SUGP2, see Figure 6A). These proteins recognize RNA motifs present within the L1 elements. The extent of splice-repressive proteins assembling on the L1 elements leads to selective pressure against young L1 insertions in a large proximity window of established exons. Evolutionarily older elements have a high probability of losing binding sites of repressive RBPs. Their exonization is more common, but still largely tissue-specific.

References

    1. Attig J., Ruiz de Los Mozos I., Haberman N., Wang Z., Emmett W., Zarnack K., König J., Ule J. Splicing repression allows the gradual emergence of new Alu-exons in primate evolution. eLife. 2016;5:e19545. - PMC - PubMed
    1. Bakkar N., Kovalik T., Lorenzini I., Spangler S., Lacoste A., Sponaugle K., Ferrante P., Argentinis E., Sattler R., Bowser R. Artificial intelligence in neurodegenerative disease research: use of IBM Watson to identify additional RNA-binding proteins altered in amyotrophic lateral sclerosis. Acta Neuropathol. 2018;135:227–247. - PMC - PubMed
    1. Banani S.F., Lee H.O., Hyman A.A., Rosen M.K. Biomolecular condensates: organizers of cellular biochemistry. Nat. Rev. Mol. Cell Biol. 2017;18:285–298. - PMC - PubMed
    1. Beck C.R., Collier P., Macfarlane C., Malig M., Kidd J.M., Eichler E.E., Badge R.M., Moran J.V. LINE-1 retrotransposition activity in human genomes. Cell. 2010;141:1159–1170. - PMC - PubMed
    1. Belancio V.P., Hedges D.J., Deininger P. LINE-1 RNA splicing and influences on mammalian gene expression. Nucleic Acids Res. 2006;34:1512–1521. - PMC - PubMed

Publication types

MeSH terms