Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 May 24:2025.05.23.655792.
doi: 10.1101/2025.05.23.655792.

A Tandem Repeat Atlas for the Genome of Inbred Mouse Strains: A Genetic Variation Resource

Affiliations

A Tandem Repeat Atlas for the Genome of Inbred Mouse Strains: A Genetic Variation Resource

Wenlong Ren et al. bioRxiv. .

Abstract

Tandem repeats (TRs) are a significant source of genetic variation in the human population; and TR alleles are responsible for over 60 human genetic diseases and for inter-individual differences in many biomedical traits. Therefore, we utilized long-read sequencing and state of the art computational programs to produce a database with 2,528,854 TRs covering 39 inbred mouse strains. As in humans, murine TRs are abundant and were primarily located in intergenic regions. However, there were important species differences: murine TRs did not have the extensive number of repeat expansions like those associated with human repeat expansion diseases and they were not associated with transposable elements. We demonstrate by analysis of two biomedical phenotypes, which were identified over 40 years ago, that this TR database can enhance our ability to characterize the genetic basis for trait differences among the inbred strains.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS W.R., W.L., Z.F, B.W., Z.C., and G.P. declare no conflict of interest. E.D. is an employee and shareholder of Pacific Biosciences.

Figures

Figure 1.
Figure 1.. Overview of the pipeline used to analyze the genomic sequences of 40 inbred mouse strains to generate the TR database.
Long Read Sequencing (LRS) was performed on 40 inbred strains, and C57BL/6 was used as the reference sequence. The programs used to generate the TR catalog and for TR genotyping are shown. The TRs in all 39 (or 35 classical) inbred strains were merged. The TRs in all strains matching the reference sequence (i.e., non-polymorphic TRs) or where heterozygous alternative alleles (i.e., potential mosaic TRs) were detected were removed. A TR database with 2,528,854 (1,819,293) was established. The numbers within parenthesis indicate the number of TRs present in the 35 classical inbred strains.
Figure 2.
Figure 2.. The distribution and characteristics of TRs in 35 classical inbred strains.
(A and B) The total number of TRs (A) and the number of strain-unique TRs (B) are shown for each strain. Four strains (CE, KK, SMJ, and TallyHo (TH)) possess a greater number of strain-unique TRs. (C) The number of TRs where a minor allele is shared by the indicated number of strains is shown. Most of the minor TR alleles are shared by 1–3 strains.
Figure 3.
Figure 3.. The genomic distribution and properties of TRs in the 35 classical inbred strains.
(A) The distribution of TRs in different types of genomic regions. (B) The number of TRs with different motif lengths. Most TRs are <7 bp (left), while TRs with motifs >6 bp are rarer (right). (C) The number of TRs with alleles with the indicated number of motifs. The Y-axis is log10 transformed.
Figure 4.
Figure 4.. Linkage disequilibrium (LD) decay patterns across different types of genetic variants in 35 inbred strains. The LD patterns were calculated using:
(A) 21 million SNPs (B) 220K SNPs (C) 220K structural variants (SVs) (D) 1.8 million tandem repeats (TRs) The y-axis represents LD values (r2), and the x-axis indicates physical distance (kb). The maximum LD values are 0.811, 0.756, 0.487, and 0.850, with LD decaying to half of these values at 133 kb, 177 kb, 291 kb, and 0.1 kb, respectively.
Figure 5.
Figure 5.. The effect of PL/J Pdrm9 and NZB Cacna2d3 TR alleles on protein structure.
(A) The Prdm9 protein (residues 1–847) has a conserved N-terminal segment (1–664, blue), and a COOH terminal region (664–847, orange) whose sequence is altered by a PL/J-specific TR allele. The C57BL/6J (49 bp) and the PL/J (302 bp)TR alleles are shown above the protein diagram. The red rectangles (below) show the zinc finger C2H2-type domains in Prdm9 with the amino acid numbers for their starting and ending positions. The expanded PL/J TR allele alters the sequence of all six of these domains, which will greatly reduce Prdm9’s ability to bind to DNA. (B) The Cacna2d3 protein has a conserved N-terminal segment (residues 1–25, blue) and a COOH terminal region (residues 25–264, orange) whose sequence is altered by a NZB-specific TR allele. The C57BL/6 (GT)10 and NZB (GT)8 TR alleles are shown above the protein diagram, and a region with a conserved sequence is shown by the red rectangle below the protein. The NZB TR allele alters most of the amino acids in the Cacna2d3 protein sequence, which will compromise channel assembly and calcium conductance.

Similar articles

References

    1. Tanudisastro H. A., Deveson I. W., Dashnow H. & MacArthur D. G. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 25, 460–475 (2024). 10.1038/s41576-024-00692-3 - DOI - PubMed
    1. Rajan-Babu I. S., Dolzhenko E., Eberle M. A. & Friedman J. M. Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications. Nat Rev Genet 25, 476–499 (2024). 10.1038/s41576-024-00696-z - DOI - PubMed
    1. Hannan A. J. Tandem repeat polymorphisms: modulators of disease susceptibility and candidates for ‘missing heritability’. Trends Genet 26, 59–65 (2010). 10.1016/j.tig.2009.11.008 - DOI - PubMed
    1. Mukamel R. E. et al. Protein-coding repeat polymorphisms strongly shape diverse human phenotypes. Science 373, 1499–1505 (2021). 10.1126/science.abg8289 - DOI - PMC - PubMed
    1. Mukamel R. E. et al. Repeat polymorphisms underlie top genetic risk loci for glaucoma and colorectal cancer. Cell 186, 3659–3673 e3623 (2023). 10.1016/j.cell.2023.07.002 - DOI - PMC - PubMed

Publication types