Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 20;53(12):gkaf619.
doi: 10.1093/nar/gkaf619.

The origin of mirror repeats in the human genome

Affiliations

The origin of mirror repeats in the human genome

Ryan McGinty et al. Nucleic Acids Res. .

Abstract

Mirror DNA repeats were found in genomic DNA several decades ago, but their role and the mechanisms leading to their abundance have remained a mystery. The only firmly established functional property was that the subset of long homopurine-homopyrimidine mirror repeats (H-motifs) can form a triple-helical DNA secondary structure (H-DNA). Here, we analyzed the sequence content of mirror repeats in the telomere-to-telomere human genome sequence. Our findings suggest that long mirror repeats in genomic DNA originate exclusively from the expansion of simple tandem repeats (STRs). Strikingly, long H-motifs are highly overrepresented compared to all other mirror repeats and STRs. We hypothesize that long H-motif STRs could be particularly expansion-prone owing to H-DNA-mediated genome instability, pointing to the length at which this structure becomes a significant hindrance.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Illustration of H-DNA secondary structure and repeat symmetry. (A) Example of H-DNA secondary structure, which requires a mirror repeat (red and blue regions). Triple-helical DNA is stabilized through non-Watson–Crick base pairing (gray bars) in addition to typical Watson–Crick base pairing (black bars). (B) Various DNA sequences containing axes of symmetry, as indicated. MR, mirror repeat; DR, direct repeat; IR, inverted repeat; STR, simple tandem repeat. Arrows indicate direction of symmetry. Red color indicates reverse complementarity. First three examples are randomly generated sequences illustrating central axes of symmetry only. Subsequent examples illustrate that various STR motifs can contain multiple axes of symmetry. (GA)n and (GAA)n motifs are homopurine mirror repeats known as H-motifs.
Figure 2.
Figure 2.
Distribution of repeat lengths in the human genome. Counts of various repeat motifs in the telomere-to-telomere human genome reference sequence (or GRCh38 for G4 motif tracts). X-axis displays total repeat tract length. STR motifs are grouped by unit length (e.g. mononucleotides, dinucleotides, etc.). Length of perfect motifs is shown (i.e. lacking any interruptions; not applicable for G4 tracts). Tract length excludes spacer length for MR, IR, and DR motifs. For comparison, we calculated the expected number of mirror repeats in a randomized genome (see the “Materials and methods” section).
Figure 3.
Figure 3.
STR content within larger repeat motifs varies by length. Within each (A) MR, (B) IR, and (C) DR stem sequence and each (D) G4 motif tract, the most-repeated STR subsequence was determined. All repetitions of this motif are then counted (including interruptions), and the total length of the subsequence is divided by the total stem or tract length of the larger motif. In the absence of any STR, the most-repeated base is counted (i.e. random loci typically contain ∼30% A/T content in the human genome). Left panels display the mean STR portion (y-axis) versus motif length (x-axis) for MR, IR, DR, and G4 motifs, compared to chromosome-matched and length-matched random genomic loci. Transparency shows range covering 95% of values per length bin; note that very small counts result in narrower 95% ranges within long length bins, but greater noise between adjacent bins. Right panel displays the same data as a probability density function, grouped by stem or tract length as indicated.
Figure 4.
Figure 4.
hR/hY nucleotide content of mirror repeats. (A) Counts of mirror repeat motifs according to nucleotide content of the motif. Mirror stem sequences were restricted to >95% or <5% AnGn (hR/hY), AnTn (hW), AnCn (hK/hM), or all other mirror motifs. X-axis displays stem length for MR motifs. (B) STR motifs are grouped according to nucleotide content as shown. Motifs represent the sum of counts of di-, tri-, and tetranucleotides, excluding mononucleotides (e.g. hR/hY STRs consist of uninterrupted AG1–3, A2G2, or A1–3G motifs, abbreviated as A1–3G1–3). In combining STRs of different unit length, missing counts (i.e. dinucleotides of length not divisible by 2, etc.) were interpolated with a linear spline method. X-axis displays total length of STR tract. (Note that STR tract length is approximately equivalent to 2× stem length when placing an axis of mirror symmetry within the STR.) Left panels display the mean STR portion along the axis of repeat stem length for (C) hR/hY mirror repeats and (D) mirror repeats with greater nucleotide diversity, compared to chromosome-matched and length-matched random genomic loci. Transparency shows range covering 95% of values per length bin. Right panel displays the same data as a probability density function, grouped by motif length as indicated.

Similar articles

References

    1. Brendel V, Beckmann JS, Trifonov EN Linguistics of nucleotide sequences: morphology and comparison of vocabularies. J Biomol Struct Dyn. 1986; 4:11–21. 10.1080/07391102.1986.10507643. - DOI - PubMed
    1. Beckmann JS, Brendel V, Trifonov EN Intervening sequences exhibit distinct vocabulary. J Biomol Struct Dyn. 1986; 4:391–400. 10.1080/07391102.1986.10506357. - DOI - PubMed
    1. Lewin B Interaction of regulator proteins with recognition sequences of DNA. Cell. 1974; 2:1–7. 10.1016/0092-8674(74)90002-6. - DOI - PubMed
    1. Jovin TM Recognition mechanisms of DNA-specific enzymes. Annu Rev Biochem. 1976; 45:889–920. 10.1146/annurev.bi.45.070176.004325. - DOI - PubMed
    1. Wilson DA, Thomas CA Jr Palindromes in chromosomes. J Mol Biol. 1974; 84:115–38. 10.1016/0022-2836(74)90216-2. - DOI - PubMed

LinkOut - more resources