Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 30;12(1):13124.
doi: 10.1038/s41598-022-17267-z.

Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment

Affiliations

Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment

L G Fearnley et al. Sci Rep. .

Abstract

Bioinformatic methods for detecting short tandem repeat expansions in short-read sequencing have identified new repeat expansions in humans, but require alignment information to identify repetitive motif enrichment at genomic locations. We present superSTR, an ultrafast method that does not require alignment. superSTR is used to process whole-genome and whole-exome sequencing data, and perform the first STR analysis of the UK Biobank, efficiently screening and identifying known and potential disease-associated STRs in the exomes of 49,953 biobank participants. We demonstrate the first bioinformatic screening of RNA sequencing data to detect repeat expansions in humans and mouse models of ataxia and dystrophy.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
An overview of superSTR, its compression heuristic, and the heuristic’s performance in simulated reads. (a) superSTR analysis involves a per-sample processing step where repeats are identified and a cohort-level analysis where samples are analysed, ultimately leading to post-superSTR analysis or experimental confirmation of findings. (b) superSTR relies on relative compressibility to distinguish between repeat containing reads. Compression with zlib involves removal of duplication. Read A (which does not contain significant repetition) will compress less than read B (which does), and the ratio of compressed size to uncompressed size will be greater for A than B. B) Distribution of C compression ratios in 150nt pseudorandom reads and repeat-containing reads drawn from a distribution where nucleotides are equiprobable and no errors are present. A more complete characterization across different distributions, read lengths and error rates is contained in Supplementary Figs. S1–S5.
Figure 2
Figure 2
superSTR analysis of WGS and RNA-seq RE data. The distribution of information scores in controls is shown in grey (lower part of each graphic) and affected individuals in color (upper part of each graphic). A right shift in the distribution or the presence of a tail indicates an increased quantity of repeats of that motif in the sequencing data. (ad) show comparison of disease groups within the Illumina RE cohort to the Illumina Diversity cohort. (e, f) show RNA-seq analysis. (a) AGC profile of DM1-bearing individuals with long-tailed distribution characteristic of relatively large RE; (b) AGC profile of HD-bearing individuals with a much shorter RE; (c) CCG profile of FXS individuals; (d) AAG profile of FRDA individuals. (e) RNA-seq AGC profile of peripheral blood mononuclear cells from 12 individuals with SCA3 RE against 12 matched non-SCA3 controls. (f) RNA-seq AGC profile of from eight patients with confirmed FECD3 expansions and six controls without FECD (of any type).

Similar articles

Cited by

References

    1. Willems T, et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods. 2017;14:590–592. doi: 10.1038/nmeth.4267. - DOI - PMC - PubMed
    1. Bolton KA, et al. STaRRRT: A table of short tandem repeats in regulatory regions of the human genome. BMC Genom. 2013;14:795. doi: 10.1186/1471-2164-14-795. - DOI - PMC - PubMed
    1. Usdin K. The biological effects of simple tandem repeats: Lessons from the repeat expansion diseases. Genome Res. 2008;18:1011–1019. doi: 10.1101/gr.070409.107. - DOI - PMC - PubMed
    1. Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet. 2018;19:286–298. doi: 10.1038/nrg.2017.115. - DOI - PubMed
    1. Wren JD, et al. Repeat polymorphisms within gene regions: Phenotypic and evolutionary implications. Am. J. Hum. Genet. 2000;67:345–356. doi: 10.1086/303013. - DOI - PMC - PubMed

Publication types