Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 28;21(1):102.
doi: 10.1186/s13059-020-02017-z.

ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data

Affiliations

ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data

Egor Dolzhenko et al. Genome Biol. .

Abstract

Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.

Keywords: Fragile X syndrome; Friedreich ataxia; Genome-wide analysis; Huntington disease; Myotonic dystrophy type 1; Repeat expansions; Short tandem repeats; Whole-genome sequencing data.

PubMed Disclaimer

Conflict of interest statement

ED, SC, VGG, AG, BL, RJT, DRB, and MAE are or were employees of Illumina, Inc., a public company that develops and markets systems for genetic analysis.

Figures

Fig. 1
Fig. 1
Diagram illustrating the types and counts of reads generated by simulating repeats of different lengths. When the repeat is shorter than the read length (left panels), there are no IRRs associated with the repeat. When a repeat is longer than the read length but shorter than the fragment length (middle panels), anchored IRRs but no paired IRRs are present. As the repeat length approaches and exceeds the fragment length (right panels), paired IRRs are generated in addition to anchored IRRs
Fig. 2
Fig. 2
(Left) A search for anchored IRRs is performed across all aligned reads. (Middle) The IRR counts are summarized into STR profiles. (Right) The resulting STR profiles are merged across all samples. If the dataset can be partitioned into cases and controls, IRR counts in these groups are compared for each locus. Alternatively, if no such partition is possible, an outlier analysis is performed
Fig. 3
Fig. 3
Genome-wide analysis of anchored IRRs comparing cases with known pathogenic expansions in DMPK, FXN, FMR1, and HTT genes (top to bottom) to 150 controls
Fig. 4
Fig. 4
Ranking of known expansions based on the outlier score computed for anchored IRRs. Each rank originates from a genome-wide analysis of a dataset consisting of one (ac) or five (d) samples with a known expansion and 150 controls. a Ranks for all identified repeats. b Ranks for repeats with 2–6-bp motifs. c Ranks for repeats located in the 5-kbp region around exons of brain-expressed genes. d Ranks for datasets with five case samples

References

    1. Muir P, Li S, Lou S, Wang D, Spakowicz DJ, Salichos L, et al. The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biol. 2016;17:53. - PMC - PubMed
    1. Erikson GA, Bodian DL, Rueda M, Molparia B, Scott ER, Scott-Van Zeeland AA, et al. Whole-genome sequencing of a healthy aging cohort. Cell. 2016;165:1002–1011. - PMC - PubMed
    1. Telenti A, Pierce LCT, Biggs WH, di Iulio J, Wong EHM, Fabani MM, et al. Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci U S A. 2016;113:11901–11906. - PMC - PubMed
    1. Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A, et al. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. 2015;47:435–444. - PubMed
    1. Nagasaki M, Yasuda J, Katsuoka F, Nariai N, Kojima K, Kawai Y, et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun. 2015;6:8018. - PMC - PubMed

Publication types

Grants and funding

LinkOut - more resources