Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 6;103(6):858-873.
doi: 10.1016/j.ajhg.2018.10.015. Epub 2018 Nov 29.

Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data

Affiliations

Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data

Rick M Tankard et al. Am J Hum Genet. .

Abstract

Repeat expansions cause more than 30 inherited disorders, predominantly neurogenetic. These can present with overlapping clinical phenotypes, making molecular diagnosis challenging. Single-gene or small-panel PCR-based methods can help to identify the precise genetic cause, but they can be slow and costly and often yield no result. Researchers are increasingly performing genomic analysis via whole-exome and whole-genome sequencing (WES and WGS) to diagnose genetic disorders. However, until recently, analysis protocols could not identify repeat expansions in these datasets. We developed exSTRa (expanded short tandem repeat algorithm), a method that uses either WES or WGS to identify repeat expansions. Performance of exSTRa was assessed in a simulation study. In addition, four retrospective cohorts of individuals with eleven different known repeat-expansion disorders were analyzed with exSTRa. We assessed results by comparing the findings to known disease status. Performance was also compared to three other analysis methods (ExpansionHunter, STRetch, and TREDPARSE), which were developed specifically for WGS data. Expansions in the assessed STR loci were successfully identified in WES and WGS datasets by all four methods with high specificity and sensitivity. Overall, exSTRa demonstrated more robust and superior performance for WES data than did the other three methods. We demonstrate that exSTRa can be effectively utilized as a screening tool for detecting repeat expansions in WES and WGS data, although the best performance would be produced by consensus calling, wherein at least two out of the four currently available screening methods call an expansion.

Keywords: next-generation sequencing; repeat-expansion disorders; short tandem repeats; whole-exome sequencing; whole-genome sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ECDF of Repeat-Expansion Composition of Reads from the WES Cohort Four different known repeat-expansion disorders captured by WES are shown: (A) HD, (B) SCA2, (C) SCA6, and (D) SCA1. Sample rptWEHI3 (blue) is from an individual with a known HD repeat expansion. The expanded allele size is not known. Sample rptWEHI1 (yellow) is a known SCA2 repeat expansion of length 42 repeats, sample rptWEHI2 (red) is a known SCA6 repeat expansion of length 22 repeats, and sample rptWEHI4 (green) is a known SCA1 repeat expansion of length 52 repeats. The title at the top of each individual figure gives the locus being examined; the reference number of repeats in the hg19 human genome reference and the corresponding number of base pairs; and the smallest reported expanded allele in the literature (the corresponding number of base pairs is given in brackets). The blue dashed vertical line in the plot denotes the largest known normal allele and the red dashed vertical line denotes the smallest known expanded allele.
Figure 2
Figure 2
ECDFs of Repeat-Expansion Composition of Reads from the WGS_PCR_2 Cohort Four different STR loci are shown: (A) SCA1 (lengths of the expanded alleles are 52 and 45 repeats); (B) FRDA (lengths of the expanded alleles are 320 and 788 repeats); (C) SCA7 (length of the expanded allele is 39 repeats); and (D) DM1 (lengths of the expanded alleles are 173 and 83 repeats). Colored samples are those called by exSTRa as repeat expansions at the STR locus. The blue dashed vertical line in the plot denotes the largest known normal allele and the red dashed vertical line denotes the smallest known expanded allele.
Figure 3
Figure 3
ECDFs of Repeat-Expansion Composition of Reads from the WGS_PF Cohort (A) DM1, (B) FRDA, (C) FRAXA, and (D) HD. The title at the top of each individual figure gives the locus being examined; the reference number of repeats in the hg19 human genome reference and the corresponding number of base pairs; and the smallest reported expanded allele in the literature (the corresponding number of base pairs is given in brackets). The blue dashed vertical line in the plot denotes the largest known normal allele and the red dashed vertical line denotes the smallest known expanded allele.

References

    1. Benson G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. - PMC - PubMed
    1. Jones L., Houlden H., Tabrizi S.J. DNA repair in the trinucleotide repeat disorders. Lancet Neurol. 2017;16:88–96. - PubMed
    1. Hannan A.J. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet. 2018;19:286–298. - PubMed
    1. Seixas A.I., Loureiro J.R., Costa C., Ordóñez-Ugalde A., Marcelino H., Oliveira C.L., Loureiro J.L., Dhingra A., Brandão E., Cruz V.T. A Pentanucleotide ATTTC Repeat Insertion in the Non-coding Region of DAB1, Mapping to SCA37, Causes Spinocerebellar Ataxia. Am. J. Hum. Genet. 2017;101:87–103. - PMC - PubMed
    1. Ishiura H., Doi K., Mitsui J., Yoshimura J., Matsukawa M.K., Fujiyama A., Toyoshima Y., Kakita A., Takahashi H., Suzuki Y. Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat. Genet. 2018;50:581–590. - PubMed

Publication types