Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Sep;22(9):589-607.
doi: 10.1038/s41580-021-00382-6. Epub 2021 Jun 17.

Molecular mechanisms underlying nucleotide repeat expansion disorders

Affiliations
Review

Molecular mechanisms underlying nucleotide repeat expansion disorders

Indranil Malik et al. Nat Rev Mol Cell Biol. 2021 Sep.

Erratum in

Abstract

The human genome contains over one million short tandem repeats. Expansion of a subset of these repeat tracts underlies over fifty human disorders, including common genetic causes of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (C9orf72), polyglutamine-associated ataxias and Huntington disease, myotonic dystrophy, and intellectual disability disorders such as Fragile X syndrome. In this Review, we discuss the four major mechanisms by which expansion of short tandem repeats causes disease: loss of function through transcription repression, RNA-mediated gain of function through gelation and sequestration of RNA-binding proteins, gain of function of canonically translated repeat-harbouring proteins, and repeat-associated non-AUG translation of toxic repeat peptides. Somatic repeat instability amplifies these mechanisms and influences both disease age of onset and tissue specificity of pathogenic features. We focus on the crosstalk between these disease mechanisms, and argue that they often synergize to drive pathogenesis. We also discuss the emerging native functions of repeat elements and how their dynamics might contribute to disease at a larger scale than currently appreciated. Lastly, we propose that lynchpins tying these disease mechanisms and native functions together offer promising therapeutic targets with potential shared applications across this class of human disorders.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Molecular mechanisms driving nucleotide repeat expansion pathogenesis.
Hyper-methylation of promoter regions can lead to transcriptional silencing, resulting in partial or complete loss of the native protein harboring the repeats. In contrast, active transcription through the repeats can trigger formation of R-loops (RNA:DNA hybrids that lead to DNA damage response pathway activation) and potentially exacerbate somatic repeat instability. Transcribed repeat RNAs fold into complex structures, which aberrantly interact with and sequester cellular RNA-binding proteins (RBPs). Trinucleotide repeat expansions in protein-coding sequences generate mutant proteins that elicit gain-of-function toxicity. Finally, the coding and non-coding repeat RNAs are translated in the absence of canonical AUG-mediated initiation through repeat-associated non-AUG (RAN) translation, producing toxic polymeric peptides.
Figure 2:
Figure 2:. Repeat-induced transcriptional silencing, R-loops and somatic instability.
(A) Allelic classes of the FMR1 gene containing normal to pathogenic CGG repeats. FMR1 normally has ~30 CGG repeats in its 5’ UTR that are not included as a part of the mature protein product FMRP. Pre-mutation (55–200 repeats) expansions result in production of large CGG repeat-containing RNAs that underlie the age-related neurodegenerative disorder fragile X-associated tremor/ataxia syndrome (FXTAS). Full mutation (>200 repeats) expansions in subsequent generations lead to silencing of the FMR1 locus and fragile X syndrome (FXS). (B) CGG repeat methylation (mC) may direct transcriptional silencing by favoring histone methylation and heterochromatin formation through mechanisms similar to those typically active at CpG islands (left panel). Alternatively or cooperatively, nascent RNA may trigger epigenetic silencing by hybridizing to the complementary CGG-repeat DNA to form RNA:DNA duplexes that recruit polycomb repressive complex 2 (PRC2) (right panel). (C) Transcription-induced R-loops also support formation of DNA slip-out structures that contribute to repeat instability. For CAG/CTG trinucleotide repeat expansions, extended stable hairpins form in both strands. Normally, mismatch repair (MMR) pathways keep the repeat tract length stable by melting the slip-outs, followed by gap-filling by DNA polymerase across the region. Inefficient repair or formation of multiple slip-outs leads to retention of the slip-out structures and expansion of the repeat region by incorporation of the looped DNA. Small molecules that target slip-out structures in CAG repeat DNA inhibit repeat expansion and bias instability toward contraction.
Figure 3:
Figure 3:. Mechanisms of RNA toxicity in repeat expansion diseases.
(A) Long repetitive RNAs and RNA-binding proteins (RBPs) interact to form complex nuclear-retained RNA foci. (B) RNA foci are formed and maintained through a stochastic combination of intramolecular and intermolecular interactions. (C) A conceptual phase diagram [G] describes the thermodynamics of RNA foci in repeat expansion diseases. The transition from soluble RNA to RNA-protein phase separation is defined by the sum of RNA-RNA, protein-protein, and RNA-protein interactions (phase boundary isolines drawn as solid lines). (D) RNA processing is impaired by sequestration of RBPs on repetitive RNA, the extent of which is a cell-specific function of repeat length, host gene expression, and RBP expression. (E) Effects of nuclear retention on RBP localization can be exacerbated by autoregulatory dynamics, which may additionally disrupt cytoplasmic processes mediated by RBPs. (F) Competition between RBPs at RNA foci may modulate disease-associated sequestration. In DM2, both MBNL and RBFOX proteins bind the expanded CCUG repeat RNA, and overexpression of RBFOX partially displaces MBNL from RNA foci in muscle cells.
Figure 4:
Figure 4:. Mechanisms of repeat-associated non-AUG (RAN) translation.
(A) Canonical AUG-mediated initiation and some forms of RAN translation require binding of eIF4F complex (eIF4E, eIF4G and eIF4A) to the 5’ m7G cap with eIF4B and/or eIF4H. After assembly, the 43S pre-initiation complex (PIC) scans 5’ to 3’ along the mRNA until selecting an AUG or near-AUG codon (for example: CUG) for initiation. eIF2α phosphorylation (eIF2α-P) under stress blocks ternary complex recycling and inhibits canonical translation, but allows for continued RAN translation. RBPs regulate RAN initiation by binding and altering repeat RNA structures. Known RAN-associated factors are depicted with solid lines, while canonical and IRES initiation factors involved in RAN translation are depicted with dashed lines. (B) RAN translation may also initiate through IRES-like mechanisms in a cap-independent manner, supported by RPS25 and other IRES-trans acting factors (ITAFs). (C) RAN translation from the C9ORF72 GGGGCC sense and CCCCGG antisense transcripts generates multiple dipeptide repeats (DPRs). While all DPRs are detected in patient tissues or generated by cellular reporters, arginine-containing DPRs show the highest intrinsic toxicity in model systems. (D) Stable RNA secondary structures formed by GGGGCC repeats induce ribosomal frameshifting during RAN translation, leading to production of chimeric DPRs. 40S and 60S = ribosomal subunits, eIF = eukaryotic initiation factor, IRES = internal ribosome entry site, m7G = 7-methylguanosine, PKR = Protein kinase R, RBPs = RNA-binding proteins, uORF = upstream open reading frame.
Figure 5:
Figure 5:. Synergy across pathogenic mechanisms in repeat expansion diseases.
(A) In multiple diseases, the four major mechanisms detailed in this review can co-exist and/or synergize to drive complex pathology. For example, in C9 ALS/FTD, expanded GGGGCC repeats can induce intron retention, which leads to haploinsufficiency of C9ORF72, as well as exacerbates RBP sequestration by increasing the half-life of the repeat RNA. In addition, intron retention may increase the production of dipeptide repeats that activate numerous downstream pathogenic pathways. In DM2, expanded CCTG repeats lead to intron retention, which also results in reduction of mRNA available to generate full-length CNBP protein. RAN translation products can be generated from the intron-retained mRNA. In Huntington disease (HD), expanded CAG repeats alter RNA processing to impair recognition of the exon 1 donor splice site; this results in the formation of a truncated polyQ-containing HTT protein that is more toxic than full-length polyQ-containing HTT. RAN translation can also occur across the CAG repeat. In FXTAS/FXS, the CGG repeat can not only sequester RBPs, but can also enhance RAN translation of the uORF such that translation initiation for FMRP is reduced. (B) A more detailed view of pathways activated in C9 ALS/FTD shows that some pathogenic mechanisms can exacerbate or feed into other mechanisms. A complex network of cause and effect, including feed-forward loops, may synergize to drive disease pathology. (C) A more detailed view of pathways activated in HD also similarly reveals feedback loops in both the nucleus and cytoplasm.
Figure 6:
Figure 6:. Roles of repeats in human disease and neuronal function.
(A) Short tandem repeats represent ~3% of the human genome, with enrichment of specific elements within 5’ UTRs, ORFs, and introns. STR mutation rates are orders of magnitude higher than single nucleotide polymorphisms and their size can influence gene expression. (B) CAG repeats in ATXN2, which when fully expanded cause spinocerebellar ataxia type 2 (SCA2), act as risk alleles for development of ALS and other neurodegenerative disorders when the repeats are of intermediate size. Loss of ATXN2 suppresses ALS phenotypes in model systems. (C) The normal-length CGG repeat in FMR1, which when expanded causes FXS and FXTAS, serves to regulate translation of the FMR1 gene product, FMRP, in response to synaptic stimuli.

References

    1. Lander ES et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). - PubMed
    1. Kruglyak S, Durrett RT, Schug MD & Aquadro CF Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Natl. Acad. Sci. U. S. A 95, 10774–10778 (1998). - PMC - PubMed
    1. Quilez J et al. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res 44, 3750–3762 (2016). - PMC - PubMed
    1. Fotsing SF et al. The impact of short tandem repeat variation on gene expression. Nat. Genet 51, 1652–1659 (2019). - PMC - PubMed
    1. Fu YH et al. Variation of the CGG repeat at the fragile X site results in genetic instability: resolution of the Sherman paradox. Cell 67, 1047–1058 (1991). - PubMed

Publication types