Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Dec 14;7(3):361-381.
doi: 10.1042/ETLS20230074.

Advances in the discovery and analyses of human tandem repeats

Affiliations
Review

Advances in the discovery and analyses of human tandem repeats

Mark J P Chaisson et al. Emerg Top Life Sci. .

Abstract

Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.

Keywords: bioinformatics; disease; sequencing; tandem repeats; visualization.

PubMed Disclaimer

Conflict of interest statement

COI (Conflicts of Interest) Statement

E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. DEM is on a scientific advisory board at Oxford Nanopore Technologies (ONT), is engaged in a research agreement with ONT, and ONT has paid for him to travel to speak on their behalf. A.S. is an employee of Cajal Neuroscience Inc.

Figures

Figure 1.
Figure 1.. Genome-wide distribution of tandem repeats in 47 HPRC human genomes.
The ideogram depicts the genome-wide distribution for VNTRs (n=166,918, purple) and STRs (n=131,679, green). We also show HOR regions of the genome enriched in satellite sequences (red) and tandem segmental duplications (SDs; blue) that map less than 1 Mbp apart. The average STR and VNTR lengths are 174 bp and 516 bp, while their average motif lengths are 3 bp and 49 bp, respectively. SDs and centromere satellite annotation are based only on the T2T reference genome.
Figure 2.
Figure 2.. Methods for discovery, genotyping and annotations.
Top, methods to genotype STR expansions using short reads. a, STR genotyping methods that require reads to fully span alleles: RepeatSeq, lobSTR, and hipSTR. These may provide exact repeat counts as well as phased variants. b, STR genotyping methods that can genotype expansions larger than the length of a read or paired-end insert: TREDPARSE, GangSTR, STRetch, STRling, and ExpansionHunter Denovo. The resulting calls are an estimate of motif repeat counts. c, ExpansionHunter genotypes long expansions that fit predefined patterns. Middle, methods to genotype VNTR expansions using short reads. d, adVNTR and adVNTR-NN use a hidden Markov model to estimate repeat unit copy number. e, Copy number defined by CNVnator correlates with ground-truth copy number with sufficient accuracy for association analysis. f, Genotyping VNTR length using repeat pangenome graphs can detect changes in motif composition. Bottom, methods to genotype VNTR alleles with long reads. g, Schematic of algorithmic approach used by TRviz and vamos to annotate motif copies in LRS and genome assemblies using wrap-around dynamic programming that aligns an optimal number of copies of a motif sequence to a genomic sequence by copying alignment scores from the last row to the first and allowing the trace back path index (red) to wrap around from the beginning to the end of the motif.
Figure 3.
Figure 3.. Methods to visualize tandem repeats and their variation.
a, A 63 bp VNTR in MIR4435-2 host gene (MIR4435-2HG) is used as an example (top). A dot plot alignment of one allele each from two individuals is shown (bottom). b, A Waterfall plot where repeat motifs are assigned various colors based on sequence identity (top) and then sorted by total length (bottom). This strategy is also often used to demonstrate individual reads separated by allele for one individual. c, A Seattle plot, which takes color-coded alleles and organizes them by internal sequence similarity to highlight clusters of related alleles. Insertions and deletions (representing whitespace gaps) are observed in this context. d, A StainedGlass plot (reproduced from [104]) demonstrating sequence homology for a tandem repeat in heatmap form, often used for centromeric repeats. Heat map defines % identity of the higher-order repeat (red~99% identical versus blue ~70% identity) e, A Pangenome graph highlighting different common paths that a repeat can take in the context of the genome including other structural variants, like two that are shown here.

References

    1. Dib C, Fauré S, Fizames C, Samson D, Drouot N, Vignal A, et al. (1996) A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380, 152–154 10.1038/380152a0 - DOI - PubMed
    1. Fu YH, Kuhl DPA, Pizzutti A, Pieretti M and Richards S Fragile X site: A polymorphic and highly mutable CGG repeat in the FMR-1 gene. Cell 10.1111/j.1469-1809.2011.00694.x - DOI
    1. Sulovari A, Li R, Audano PA, Porubsky D, Vollger MR, Logsdon GA, et al. (2019) Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl. Acad. Sci. U. S. A 116, 23243–23253 10.1073/pnas.1912175116 - DOI - PMC - PubMed
    1. Bailey JA, Yavor AM, Massa HF, Trask BJ and Eichler EE (2001) Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11, 1005–1017 10.1101/gr.gr-1871r - DOI - PMC - PubMed
    1. Hoyt SJ, Storer JM, Hartley GA, Grady PGS, Gershman A, de Lima LG, et al. (2022) From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science 376, eabk3112 10.1126/science.abk3112 - DOI - PMC - PubMed

Publication types

LinkOut - more resources