Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 3;105(1):151-165.
doi: 10.1016/j.ajhg.2019.05.016. Epub 2019 Jun 20.

Bioinformatics-Based Identification of Expanded Repeats: A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS

Affiliations

Bioinformatics-Based Identification of Expanded Repeats: A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS

Haloom Rafehi et al. Am J Hum Genet. .

Abstract

Genomic technologies such as next-generation sequencing (NGS) are revolutionizing molecular diagnostics and clinical medicine. However, these approaches have proven inefficient at identifying pathogenic repeat expansions. Here, we apply a collection of bioinformatics tools that can be utilized to identify either known or novel expanded repeat sequences in NGS data. We performed genetic studies of a cohort of 35 individuals from 22 families with a clinical diagnosis of cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Analysis of whole-genome sequence (WGS) data with five independent algorithms identified a recessively inherited intronic repeat expansion [(AAGGG)exp] in the gene encoding Replication Factor C1 (RFC1). This motif, not reported in the reference sequence, localized to an Alu element and replaced the reference (AAAAG)11 short tandem repeat. Genetic analyses confirmed the pathogenic expansion in 18 of 22 CANVAS-affected families and identified a core ancestral haplotype, estimated to have arisen in Europe more than twenty-five thousand years ago. WGS of the four RFC1-negative CANVAS-affected families identified plausible variants in three, with genomic re-diagnosis of SCA3, spastic ataxia of the Charlevoix-Saguenay type, and SCA45. This study identified the genetic basis of CANVAS and demonstrated that these improved bioinformatics tools increase the diagnostic utility of WGS to determine the genetic basis of a heterogeneous group of clinically overlapping neurogenetic disorders.

Keywords: CANVAS; ataxia; repeat expansions; short tandem repeats; whole-genome sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the CANVAS Study and Genetic Investigations Performed
Figure 2
Figure 2
Linkage of the CANVAS Locus to Chromosome 4 and Identification of (AAGGG)exp Intronic Insertion in RFC1 (A) The pedigree of the family CANVAS9 highlights the apparent recessive inheritance pattern. (B) Linkage analysis of CANVAS9 identified significant linkage to chromosome 4 (LOD = 3.25). (C) Linkage regions for individual families CANVAS1, 2, 3, 4, and 9 are shown in blue and the overlapping region shown in red (chr4:38887351–40463592, combined LOD = 7.04). (D) STR analysis of WGS from two unrelated individuals with CANVAS identified an expanded STR in the second intron of RFC1. The (AAAAG)11 motif that is present in the reference genome and part of an existing Alu element (AluSx3) is replaced by the (AAGGG)exp RE.
Figure 3
Figure 3
Computational Validation of the (AAGGG)exp RE The (AAGGG)exp RE at the coordinates chr4:39350045–39350095 was added to the reference databases of the tools exSTRa, EH, GangSTR, TREDPARSE, and STRetch and WGS data from four unrelated individuals with CANVAS was analyzed (CANVAS1, orange; CANVAS2, blue; CANVAS8, red; and CANVAS9, green). The non-CANVAS control subjects are presented in gray. Plots have been divided into PCR-based and PCR-free WGS (left and right columns, respectively). The Y and X axes for ExpansionHunter, GangSTR, and TREDPARSE refer to the number of repeat units on the longer and shorter allele per individual, respectively. The y axis for the STRetch plot refers to the number of individuals.
Figure 4
Figure 4
Genetic Validation of the (AAGGG)exp RE (A) PCR analysis of the RFC1 STR failed to produce the control ∼253 bp reference product in 18 of 22 CANVAS-affected families. (B and C) Representative images of the repeat-primed PCR for the (AAGGG)exp RE demonstrating a saw-toothed product with 5 base pair repeat unit size, amplified from gDNA of individuals from CANVAS1 (B) and CANVAS9 (C). (D and E) No product was observed for the unaffected control (D) and no gDNA template negative control (E).
Figure 5
Figure 5
The Majority of Individuals with CANVAS Encode an Ancestral Haplotype (A) Analysis of WES data identified an ancestral haplotype surrounding RFC1 in all affected individuals confirmed to carry the (AAGGG)exp RE. (B) The core haplotype (blue highlight) was intersected with the linkage disequilibrium (LD) track in the UCSC browser (converted to hg18 coordinates). The three LD tracks represent the Yoruba population (top track), Europeans (middle), and Han Chinese and Japanese from Tokyo (bottom). Red areas indicate strong linkage disequilibrium. The core CANVAS haplotype spans a large LD block in Europeans, which is broken up into two LD blocks in Japanese and Chinese, suggesting an ancient origin for the CANVAS repeat expansion allele. (C) Haplotype sharing between individuals with CANVAS was used to determine the age of the most recent common ancestor (MRCA) of the cohort.

Similar articles

Cited by

References

    1. McMurray C.T. Mechanisms of trinucleotide repeat instability during human development. Nat. Rev. Genet. 2010;11:786–799. - PMC - PubMed
    2. McMurray, C.T. (2010). Mechanisms of trinucleotide repeat instability during human development. Nat. Rev. Genet. 11, 786-799. - PMC - PubMed
    1. Gymrek M., Willems T., Guilmatre A., Zeng H., Markus B., Georgiev S., Daly M.J., Price A.L., Pritchard J.K., Sharp A.J., Erlich Y. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 2016;48:22–29. - PMC - PubMed
    2. Gymrek, M., Willems, T., Guilmatre, A., Zeng, H., Markus, B., Georgiev, S., Daly, M.J., Price, A.L., Pritchard, J.K., Sharp, A.J., and Erlich, Y. (2016). Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22-29. - PMC - PubMed
    1. Quilez J., Guilmatre A., Garg P., Highnam G., Gymrek M., Erlich Y., Joshi R.S., Mittelman D., Sharp A.J. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res. 2016;44:3750–3762. - PMC - PubMed
    2. Quilez, J., Guilmatre, A., Garg, P., Highnam, G., Gymrek, M., Erlich, Y., Joshi, R.S., Mittelman, D., and Sharp, A.J. (2016). Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res. 44, 3750-3762. - PMC - PubMed
    1. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. - PMC - PubMed
    2. Benson, G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573-580. - PMC - PubMed
    1. Subramanian S., Madgula V.M., George R., Mishra R.K., Pandit M.W., Kumar C.S., Singh L. Triplet repeats in human genome: distribution and their association with genes and other genomic regions. Bioinformatics. 2003;19:549–552. - PubMed
    2. Subramanian, S., Madgula, V.M., George, R., Mishra, R.K., Pandit, M.W., Kumar, C.S., and Singh, L. (2003). Triplet repeats in human genome: distribution and their association with genes and other genomic regions. Bioinformatics 19, 549-552. - PubMed

Publication types

MeSH terms