Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 15;84(4):702-714.e10.
doi: 10.1016/j.molcel.2024.01.006. Epub 2024 Jan 30.

CAG repeat expansions create splicing acceptor sites and produce aberrant repeat-containing RNAs

Affiliations

CAG repeat expansions create splicing acceptor sites and produce aberrant repeat-containing RNAs

Rachel Anderson et al. Mol Cell. .

Abstract

Expansions of CAG trinucleotide repeats cause several rare neurodegenerative diseases. The disease-causing repeats are translated in multiple reading frames and without an identifiable initiation codon. The molecular mechanism of this repeat-associated non-AUG (RAN) translation is not known. We find that expanded CAG repeats create new splice acceptor sites. Splicing of proximal donors to the repeats produces unexpected repeat-containing transcripts. Upon splicing, depending on the sequences surrounding the donor, CAG repeats may become embedded in AUG-initiated open reading frames. Canonical AUG-initiated translation of these aberrant RNAs may account for proteins that have been attributed to RAN translation. Disruption of the relevant splice donors or the in-frame AUG initiation codons is sufficient to abrogate RAN translation. Our findings provide a molecular explanation for the abnormal translation products observed in CAG trinucleotide repeat expansion disorders and add to the repertoire of mechanisms by which repeat expansion mutations disrupt cellular functions.

Keywords: Huntington's disease; RAN translation; RNA splicing; polyglutamine diseases; repeat expansion disorders; repeat-associated non-AUG translation; spinocerebellar ataxia.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests A.J. is a member of the scientific advisory board of Molecular Cell.

Figures

Figure 1.
Figure 1.. Sequence analysis pipeline for detecting splicing to CAG repeats.
A. Schematic for SATCfinder. Reads with >3xCAG/CTG repeats are selected. The CAG repeats are computationally removed and these trimmed reads are aligned to the genome. The genomic coordinates of the base immediately before the repeat (CAG end) in the trimmed and mapped reads is tracked. SATCfinder outputs the number of CAG ends per million mapped reads at a given genomic coordinate. The peak at the repeat reflects reads where CAG repeats are not a part of the splice junction, while a distal upstream peak (at the nearest upstream exon, typically within a few kb) indicates the location of the splicing donor (marked by an arrowhead ▼). B. Representative genes comparing standard RNA sequencing analysis to SATCfinder output.
Figure 2.
Figure 2.. CAG repeat expansions result in mis-splicing of repeat-containing RNAs.
A. Schematic for constructs with 240xCAG repeats with a variable 250-base flanking sequence. Some flanking sequences result in retention of the repeat-containing RNA in the nucleus while others induce RAN translation and cell toxicity. B, C, D. Top, SATCfinder output for representative CAG constructs, where the x-axis is the base coordinate within the flanking sequence region and the y-axis indicates the number of CAG ends per million mapped reads. Bottom, Representative fluorescent images of cells expressing the indicated constructs. Micrographs are representative of > 2 independent experiments. Scale bar depicts 10 pm. E. Left, sequence logos for 5’ splice sites annotated in the human genome and those observed in the CAG flanking sequence library, where the x-axis indicates the position within the 9-base donor sequence and the letter height depicts the probability of observing the base. Right, 5’ MaxEnt scores for 220419 annotated human splice donors, all 262144 possible randomly generated 9-mers, and the 20 detected splice donors in the CAG flanking sequence library. F. Quantification of the percentage of cells with cytoplasmic RNA aggregates by fluorescence microscopy. Each data point represents an independent experiment with > 500 cells per experiment, and are summarized as mean ± SD. G. Real-time quantitative PCR quantification of the relative expression of CAGran intron normalized to the expression of the 5’ end of the CAGran transcript. Splicing inhibitor reflects treatment with 25 nM pladienolide B. Data show the mean ± SD for three independent RNA isolations. H. Schematic for CAGran, depicting the transcription initiation site (as a right-facing arrow), flanking sequence, and 240xCAG repeats intervened by stop codons in each frame. The sequence of a representative donor that splices to the CAG tract is shown, with bases in the exon in uppercase. After splicing, the stop codons are removed and the CAG repeat is embedded in an AUG-initiated ORF. I. Immunoblot for cells expressing CAGfoci and CAGran using a polyglutamine antibody. J. Percentage of repeat-containing transcripts where the repeats are observed in AUG-initiated ORFs for constructs that produce RNA foci only or exhibit RAN translation. Each data point is one construct. K. Immunoblot for the indicated samples using a polyglutamine antibody. CAGRAX*donors has point mutations at all splice donor sites; CAGran*ai g has point mutations at two AUGs. Immunoblots are normalized first to tubulin, then to the parent cell line without a repeat-containing construct (mock). Immunoblots and quantification of relative polyglutamine abundance (as mean ± SD) are representative of > 2 independent experiments. *An endogenous protein (TBP) is also detected by this polyglutamine antibody,. Significance values in F, G, and J are calculated using Student’s t-test. MS2CP-YFP: bacteriophage MS2 coat protein tagged with YFP.
Figure 3.
Figure 3.. CAG-repeats with native disease-associated flanking sequences form splice acceptors.
A. Schematic for the ATXN8 mini-gene expressing -100 bp of endogenous A TXN8 sequence directly upstream from 107><CAG repeats. The ampicillin resistance gene (AS-AmpR) and colEl origin of replication are indicated. B. Left, SATCfinder output for cells transfected with the ATXN8 construct in the presence of splicing inhibitors 25 nM pladienolide B (PB), or 15 μM isoginkgetin (IGG), or 0.1% DMSO (DMSO) as control. Right, quantification of the % of CAG ends that reflect splicing to the repeat. C. Similar to B but for ATXN8 constructs without or with point mutations to identified donor sites. D. Sequences of the identified splicing donors in the ATXN8 construct with corresponding percentage of reads arising from each donor. The sequence logo for the consensus human splice donor is presented for comparison. E. Schematic for the various CAG repeat-containing transcripts produced upon splicing from the ATXN8 construct.
Figure 4.
Figure 4.. Canonical translation of aberrantly spliced CAG-repeat-containing RNA results in aberrant protein products.
A. Schematic for ATXN8 mini-gene. Upon splicing, the upstream stop codon is removed and the CAG repeat is embedded in an AUG-initiated ORF. B, C, D. Immunoblots from cells expressing the indicated ATXN8- and ATXN8KKQ- derived constructs that interrupt the predicted ORF by mutating the splice donor (B), mutating the identified in-frame AUG initiation codon (C), or by introducing a stop codon (D). Band intensities are normalized first to NPT (neomycin phosphotransferase), expressed in cis from the plasmid, and then to the endogenous protein (TBP, marked with an asterisk) in the control transfected with a similar vector but encoding for GFP (vector). Tubulin is included to show equivalent loading between conditions, but is not used for normalization due to potential variations in transfection efficiency. Immunoblots and quantification of relative polyglutamine abundance (as mean ± SD) are representative of > 2 independent transfections.
Figure 5.
Figure 5.. Splicing from an endogenous donor in ATXN8 generates AUG-initiated ORFs.
A. Schematic for design of ATXN8 mini-gene with 400 bases of endogenous ATXN8 sequence fused directly to 47><CAG repeats, followed by epitope tags in each reading frame. B. SATCfinder output for ATXN8 constructs with native upstream sequence, without or with a point mutation to the predicted donor site. C. Schematic for ORF resulting from ATXN8 minigene. Upon splicing, the CAG repeat is embedded in a new AUG-initiated ORF. D. Immunoblot for the indicated samples using an anti-HA antibody. The HA epitope is in the polyalanine frame. Immunoblots are normalized first to tubulin, then to the parent cell line without a repeat-containing construct (mock). Immunoblots and quantification of relative HA abundance (as mean ± SD) are representative of > 2 independent experiments.
Figure 6.
Figure 6.. Model for the sub-cellular localization of RNA with expanded CAG repeats.
In the absence of an expanded repeat, the RNA is normally processed and exported to the cytoplasm. If the RNAs contain expanded CAG repeats outside of an ORF, the RNAs are retained at nuclear foci, where they sequester splicing factors. If the CAG repeats are located downstream of potential splice donors, the donors may be spliced to the CAG repeat in a repeat number dependent manner. Splicing generates new RNA isoforms where the CAG repeat may be present in AUG-initiated ORFs. Translation of these AUG-initiated repeat-containing ORFs produces aberrant homopolymeric proteins that may aggregate and contribute to cellular toxicity.

Update of

References

    1. Malik I, Kelley CP, Wang ET, and Todd PK (2021). Molecular mechanisms underlying nucleotide repeat expansion disorders. Nat. Rev. Mol. Cell Biol 22, 589–607. 10.1038/s41580-021-00382-6. - DOI - PMC - PubMed
    1. Paulson H (2018). Repeat expansion diseases. Handb. Clin. Neurol 147, 105–123. 10.1016/B978-0-444-63233-3.00009-9. - DOI - PMC - PubMed
    1. de Mezer M, Wojciechowska M, Napierala M, Sobczak K, and Krzyzosiak WJ (2011). Mutant CAG repeats of Huntingtin transcript fold into hairpins, form nuclear foci and are targets for RNA interference. Nucleic Acids Res. 39, 3852–3863. 10.1093/nar/gkql323. - DOI - PMC - PubMed
    1. Wojciechowska M, and Krzyzosiak WJ (2011). Cellular toxicity of expanded RNA repeats: focus on RNA foci. Hum. Mol. Genet 20, 3811–3821. 10.1093/hmg/ddr299. - DOI - PMC - PubMed
    1. Jain A, and Vale RD (2017). RNA phase transitions in repeat expansion disorders. Nature 546, 243–247. 10.1038/nature22386. - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources