Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 3;107(3):445-460.
doi: 10.1016/j.ajhg.2020.07.004. Epub 2020 Aug 3.

Evolution of a Human-Specific Tandem Repeat Associated with ALS

Affiliations

Evolution of a Human-Specific Tandem Repeat Associated with ALS

Meredith M Course et al. Am J Hum Genet. .

Abstract

Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we characterize a human-specific 69 bp variable number tandem repeat (VNTR) in the last intron of WDR7, which exhibits striking variability in both copy number and nucleotide composition, as revealed by long-read sequencing. In addition, greater repeat copy number is significantly enriched in three independent cohorts of individuals with sporadic amyotrophic lateral sclerosis (ALS). Each unit of the repeat forms a stem-loop structure with the potential to produce microRNAs, and the repeat RNA can aggregate when expressed in cells. We leveraged its remarkable sequence variability to align the repeat in 288 samples and uncover its mechanism of expansion. We found that the repeat expands in the 3'-5' direction, in groups of repeat units divisible by two. The expansion patterns we observed were consistent with duplication events, and a replication error called template switching. We also observed that the VNTR is expanded in both Denisovan and Neanderthal genomes but is fixed at one copy or fewer in non-human primates. Evaluating the repeat in 1000 Genomes Project samples reveals that some repeat segments are solely present or absent in certain geographic populations. The large size of the repeat unit in this VNTR, along with our multiplexed sequencing strategy, provides an unprecedented opportunity to study mechanisms of repeat expansion, and a framework for evaluating the roles of VNTRs in human evolution and disease.

Keywords: WD repeat domain 7, WDR7; amyotrophic lateral sclerosis, ALS; ancient genomes; evolutionary genetics; long-read sequencing; modifier gene; neurodegenerative disease; noncoding RNA; tandem repeat expansion; variable number tandem repeat, VNTR.

PubMed Disclaimer

Conflict of interest statement

A.D.G. has served as a consultant for Aquinnah Pharmaceuticals, Prevail Therapeutics, and Third Rock Ventures and is a scientific founder of Maze Therapeutics. E.E.E. is on the scientific advisory board of DNAnexus. All other authors declare no competing interests.

Figures

Figure 1
Figure 1
A VNTR in WDR7 Is Highly Variable in Humans (A) Estimated read length of several human-specific and intronic VNTRs that are not derived from repetitive elements. Read length is given relative to the reference genome, to evaluate how variable the read length of each VNTR can be. Read length was obtained from the Answer ALS database (n = 97 samples). Black lines show mean and standard deviation. The VNTR in WDR7 exhibits greater variability than the other VNTRs. (B) Position of the VNTR in WDR7 intron 27, adjacent to a DNase I hypersensitivity site, and regions of multi-species conservation. The repeat itself is not conserved across species. The bottom schematic shows the region that was included in subsequent PCR amplification. (C) Phylogenetic tree showing events of repeat region evolution in humans and non-human primates (red hashmarks). Adjacent are the predicted RNA structures for each iteration.
Figure 2
Figure 2
The WDR7 VNTR Is Longer in Cases of sALS and Variable in Internal Sequence (A) WDR7 VNTR copy number distribution in the longest amplified alleles from individuals with sALS and sPD and control subjects obtained from the Coriell Institute. Repeat copy number is significantly higher in cases of ALS (median [IQR] = 17.5 [9–24], mean ± SD = 17.7 ± 10.4) versus control subjects (median [IQR] = 15 [8–21], mean ± SD = 15.2 ± 9.66; p = 0.0003; Mann-Whitney test). (B) Comparison of WDR7 VNTR copy number estimated from whole-genome sequencing. Read lengths were obtained from NIAGADS (n = 917 case subjects with AD and 675 control subjects), Quebec (n = 159 case subjects with ALS and 311 control subjects), and Answer ALS databases (n = 307 case subjects and 53 control subjects). Black lines show mean and standard deviation. p values were determined by a Kruskal-Wallis test (which gave p < 0.0001), followed by Dunn’s multiple comparisons. p < 0.05, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001, “ns” means “not significant.” (C) Hairpin secondary structure of one repeat copy, with the 6 of 69 variable base pair positions highlighted. (D) Repeat logo of the WDR7 VNTR, indicating relative conservation or variability at each nucleotide position. (E) The 18 “parent” repeat unit sequences that account for >99% of repeat units identified. Variable nucleotides are highlighted. Each parent sequence was assigned a color, shown on the left, used for alignment and visualization in Figure 3. (F) The frequency of each repeat unit across all individuals (n = 144 individuals, totaling 6,041 distinct repeat units).
Figure 3
Figure 3
Alignment of Repeat Units across Individuals Is Clustered and Reproducible (A) Repeat units were color coded (colors assigned in Figure 2E; black indicates a rare repeat unit without an assigned color), and then each individual’s alleles were plotted as a series of colors, from the 5′ to 3′ end (top to bottom) of the VNTR. Here, alleles are aligned based on similarity at the 3′ end of the sequence. Repeat copy number is shown on the y axis. Samples used were the same as those presented in Figure 2A. For this round of sequencing, samples selected had at least one of the two alleles 20 repeats or greater. (B) The same visualization and alignment strategy was applied to a second cohort of individuals. This cohort was expanded to include samples from individuals with AD, samples from the 1000 Genomes Project, and samples from African American individuals, obtained through Coriell. For this round of sequencing, all samples were sequenced irrespective of repeat length. An allele unique to samples from individuals of African descent is denoted by a “^,” and an allele found only in samples from individuals of Han Chinese descent is denoted by a “#.”
Figure 4
Figure 4
WDR7 VNTR Repeat Unit Heterogeneity Reveals Patterns of Expansion (A) Representative examples of observed repeat sequence patterns. Top: example of how we calculated the distance between the first instance of a repeat unit and the next, for each repeat unit in each allele. Middle: example showing that if a rare repeat unit (black) was duplicated, its neighboring repeat unit (yellow) was also duplicated. Bottom: example of larger repeat duplications. (B) Summation of the distance between each repeat unit and itself. Repeat units largely occur with a two-unit periodicity (combined results across all sequenced individuals from Figure 3A). (C) Segregation of the WDR7 VNTR in a pedigree of a family with ALS. Sequences shown in brackets are inferred. A question mark means that the allele did not amplify successfully.
Figure 5
Figure 5
WDR7 Repeat Length and Sequence Estimation in Ancient Genomes (A) Alignment of whole genome sequencing reads for Altai Neanderthal and Denisovan genomes. Read distribution in the WDR7 VNTR region indicates a build-up of reads in the Denisovan genome, and a paucity of reads in the Neanderthal genome. (B) Individual repeat units identified in the Neanderthal genome. Units were identified in reads with a full 69 bp repeat sequence present in the ∼100 bp sequence read (average length of a read). (C) Individual repeat units identified in the Denisovan genome, the relative abundance of each unit, and correlating modern-day human repeat alleles that match those abundances.
Figure 6
Figure 6
WDR7 Repeat Length and Sequence in the 1000 Genomes Project Populations (A) Distribution of WDR7 VNTR length in 1000 Genomes Project samples, grouped by super-population. (B) Distribution of the 18 most frequent repeat units within each population. In the legend, the numbers rank the frequency of the sequence overall, along with the nucleotides present at each of the six variable positions in the repeat. An underscore represents a deletion. (C) Enrichment or depletion patterns of specific repeat units are unique to certain super-populations. (D) Location of rarest repeat units on the full allele, normalized to 100% for each repeat unit. All 18 repeat units are shown in Figure S6.
Figure 7
Figure 7
Functional Consequences of WDR7 Repeats (A) Length of WDR7 VNTR expressed in HEK293 cells plotted against the normalized microRNAs produced from the WDR7 hairpin, as determined by small RNA sequencing. Linear regression gives R2 = 0.977. RPM is reads per million. (B) RNA FISH probes targeting the WDR7 repeat in MEF cells or HEK293 cells transfected with constructs containing 0 (untransfected), 1, or 36 copies of the repeat. Scale bar is 10 μm. Quantification of speckles per cell is given at right. p values were determined by a Kruskal-Wallis test (which gave p < 0.0001 for MEF cells and p = 0.0009 for HEK293 cells), followed by Dunn’s multiple comparisons. ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001.

Similar articles

Cited by

References

    1. Hannan A.J. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet. 2018;19:286–298. - PubMed
    1. Pearson C.E., Nichol Edamura K., Cleary J.D. Repeat instability: mechanisms of dynamic mutations. Nat. Rev. Genet. 2005;6:729–742. - PubMed
    1. Todd P.K., Paulson H.L. RNA-mediated neurodegeneration in repeat expansion disorders. Ann. Neurol. 2010;67:291–300. - PMC - PubMed
    1. De Roeck A., Duchateau L., Van Dongen J., Cacace R., Bjerke M., Van den Bossche T., Cras P., Vandenberghe R., De Deyn P.P., Engelborghs S., BELNEU Consortium An intronic VNTR affects splicing of ABCA7 and increases risk of Alzheimer’s disease. Acta Neuropathol. 2018;135:827–837. - PMC - PubMed
    1. Song J.H.T., Lowe C.B., Kingsley D.M. Characterization of a Human-Specific Tandem Repeat Associated with Bipolar Disorder and Schizophrenia. Am. J. Hum. Genet. 2018;103:421–430. - PMC - PubMed

Publication types

Substances