Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;43(7):859-868.
doi: 10.1002/humu.24382. Epub 2022 Apr 21.

STRipy: A graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data

Affiliations

STRipy: A graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data

Andreas Halman et al. Hum Mutat. 2022 Jul.

Abstract

Expansions of short tandem repeats (STRs) have been implicated as the causal variant in over 50 diseases known to date. There are several tools which can genotype STRs from high-throughput sequencing (HTS) data. However, running these tools out of the box only allows around half of the known disease-causing loci to be genotyped. Furthermore, the genotypes estimated at these loci are often underestimated with maximum lengths limited to either the read or fragment length, which is less than the pathogenic cutoff for some diseases. Although analysis tools can be customized to genotype extra loci, this requires proficiency in bioinformatics to set up, limiting their widespread usage by other researchers and clinicians. To address these issues, we have developed a new software called STRipy, which is able to target all known disease-causing STRs from HTS data. We created an intuitive graphical interface for STRipy and significantly simplified the detection of STRs expansions. Moreover, we genotyped all disease loci for over two and half thousand samples to provide population-wide distributions to assist with interpretation of results. We believe the simplicity and breadth of STRipy will increase the genotyping of STRs in sequencing data resulting in further diagnoses of rare STR diseases.

Keywords: bioinformatics tools; high-throughput sequencing; pathogenic mutations; rare diseases; short tandem repeats.

PubMed Disclaimer

Conflict of interest statement

Egor Dolzhenko is an employee of Illumina, Inc., a public company that develops and markets systems for genetic analysis.

Figures

Figure 1
Figure 1
Overview of STRipy's method. (A) STRipy's Client extracts out reads that are each side of the short tandem repeat (STR) locus (marked in green), flanking reads overlapping the STR region (marked in both green and red) and fully repeated reads (marked in red). When using the “Extended” analysis, reads from off‐target regions (L1 and L2) will be additionally extracted out, if they exist. The resulting analysis ready file is then forwarded to the STRipy's listening server (B) where it will be genotyped by the ExpansionHunter and read visualizations are created with REViewer, processed, and returned to the STRipy's client along with the generated PDF report.
Figure 2
Figure 2
Screen captures of STRipy's Client. (A) The upper half of the screen contains information obtained from the literature for the locus. The population‐wide allele length distribution for the locus is shown on top of the X‐axis line, which can be changed to represent data for each of the super‐populations separately. On the bottom half of the screen, the genotyping results for the sample from ExpansionHunter are displayed under the “Genotype section”, as well as coverage, read, and fragment length. Below that are two fields containing results from STRipy's algorithm. Read alignments can be found under the Alignment visualizer tab (B). A PDF report can be saved by clicking the corresponding button in the bottom.
Figure 3
Figure 3
Summary statistics for STRipy's validation showing root mean square error (RMSE) across simulations in different STR classes and length ranges. “Up to read length” represents samples where the length of repeats is simulated to be between 60 and 150 bp, “Read length to fragment length” are repeats from 151 to 450 bp, and “Over the fragment length” are all simulations where the repeat is between the average fragment length (450 bp) and 2100 bp. RMSE is divided into ranges that is used to color each cell.
Figure 4
Figure 4
Genotyping results of different types of loci. Pink dots represent one allele, which, for heterozygous samples, is fixed to 60 bp (bottom left), whereas blue dots represent the other allele that has an increasing number of repeats. Purple dots are overlaps of pink and blue ones. Both alleles were simulated with the same length for homozygous samples. The dotted green line represents the read length and orange one the average fragment length. (A) Results obtained by using ExpansionHunter out of the box with the provided catalog compared with (B) STRipy's results, which determined off‐target regions on the fly allowing genotyping of alleles longer than the fragment length. (C) Example of genotyping issues for long homozygous alleles. (D) Example of genotyping imperfect GCN‐type repeats, showing a slight increase in long alleles compared with A. (E) Example of replaced type repeats (RFC1 gene where biallelic expansions are known to cause cerebellar ataxia, neuropathy, and vestibular areflexia syndrome). (F) Example of nested type repeats (STARD7 locus where expansions are known to cause familial adult myoclonic epilepsy 2). (G,H) STRipy's genotyping results for the reference‐missing XYLT1 locus.

References

    1. Auton, A. , Abecasis, G. R. , Altshuler, D. M. , Durbin, R. M. , Bentley, D. R. , Chakravarti, A. , Clark, A. G. , Donnelly, P. , Eichler, E. E. , Flicek, P. , Gabriel, S. B. , Gibbs, R. A. , Green, E. D. , Hurles, M. E. , Knoppers, B. M. , Korbel, J. O. , Lander, E. S. , Lee, C. , Lehrach, H. , … Yao, L . (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
    1. Corbett, M. A. , Kroes, T. , Veneziano, L. , Bennett, M. F. , Florian, R. , Schneider, A. L. , Coppola, A. , Licchetta, L. , Franceschetti, S. , Suppa, A. , Wenger, A. , Mei, D. , Pendziwiat, M. , Kaya, S. , Delledonne, M. , Straussberg, R. , Xumerle, L. , Regan, B. , Crompton, D. , … Gecz, J . (2019). Intronic ATTTC repeat expansions in STARD7 in familial adult myoclonic epilepsy linked to chromosome 2. Nature Communications, 10(1), 1–10. 10.1038/s41467-019-12671-y - DOI - PMC - PubMed
    1. Cortese, A. , Simone, R. , Sullivan, R. , Vandrovcova, J. , Tariq, H. , Yau, W. Y. , Humphrey, J. , Jaunmuktane, Z. , Sivakumar, P. , Polke, J. , Ilyas, M. , Tribollet, E. , Tomaselli, P. J. , Devigili, G. , Callegari, I. , Versino, M. , Salpietro, V. , Efthymiou, S. , Kaski, D. , … Houlden, H . (2019). Biallelic expansion of an intronic repeat in RFC1 is a common cause of late‐onset ataxia. Nature Genetics, 51(4), 649–658. 10.1038/s41588-019-0372-4 - DOI - PMC - PubMed
    1. Dashnow, H. , Lek, M. , Phipson, B. , Halman, A. , Sadedin, S. , Lonsdale, A. , Davis, M. , Lamont, P. , Clayton, J. S. , Laing, N. G. , MacArthur, D. G. , & Oshlack, A. (2018). STRetch: Detecting and discovering pathogenic short tandem repeat expansions. Genome Biology, 19(1), 1–13. 10.1186/s13059-018-1505-2 - DOI - PMC - PubMed
    1. Depienne, C. , & Mandel, J.‐L. (2021). 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges? The American Journal of Human Genetics, 108(5), 764–785. 10.1016/j.ajhg.2021.03.011 - DOI - PMC - PubMed