Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 9;13(1):126.
doi: 10.1186/s13073-021-00932-9.

Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions

Collaborators, Affiliations

Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions

Indhu-Shree Rajan-Babu et al. Genome Med. .

Erratum in

Abstract

Background: Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility.

Methods: We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders.

Results: We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings.

Conclusions: We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders.

Keywords: Clinical bioinformatics; Machine learning; Next-generation sequencing; Repeat expansion; Short tandem repeats.

PubMed Disclaimer

Conflict of interest statement

ED and MAE are employees of Illumina, Inc., a public company that develops and markets systems for genetic analysis. The remaining authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Decision tree model and its performance metrics on modified analysis of BWA-aligned EGA genomes. a Decision tree generated on the training dataset (n = 940). Node #0 at the top of the tree is the root node. Each node lists an STR tool (feature). The “samples” number represents the total number of genotype calls in a particular node, and “value” shows the number of expanded (or full-mutation, FM) and non-expanded (non-FM) genotypes. Gini index shows the impurity at each node. The terminal nodes or leaves with a Gini value of 0 have genotypes belonging entirely to either the expanded or non-expanded class. EHv3, ExpansionHunter version 3; wCtrls, analysis performed with controls. b Classification report summarizing the performance metrics of the model on test data (n = 236). Macro and weighted average (avg) show the unweighted and weighted mean of performance metrics calculated for Expanded and Not_Expanded class labels, respectively. c Receiver operating characteristics and precision-recall curves. d Confusion matrix showing the number of predicted and true labels on x- and y-axis, respectively. e Feature importance plot showing the STR tool on x-axis and the tool’s normalized (Gini) importance on y-axis

References

    1. Tang H, Kirkness EF, Lippert C, Biggs WH, Fabani M, Guzman E, et al. Profiling of short-tandem-repeat disease alleles in 12,632 human whole genomes. Am J Hum Genet. 2017;101(5):700–715. doi: 10.1016/j.ajhg.2017.09.013. - DOI - PMC - PubMed
    1. Sznajder ŁJ, Swanson MS. Short tandem repeat expansions and RNA-mediated pathogenesis in myotonic dystrophy. Int J Mol Sci. 2019;9:20(13). - PMC - PubMed
    1. Paulson H. Repeat expansion diseases. Handb Clin Neurol. 2018;147:105–123. doi: 10.1016/B978-0-444-63233-3.00009-9. - DOI - PMC - PubMed
    1. Salcedo-Arellano MJ, Dufour B, McLennan Y, Martinez-Cerdeno V, Hagerman R. Fragile X syndrome and associated disorders: clinical aspects and pathology. Neurobiol Dis. 2020;136:104740. doi: 10.1016/j.nbd.2020.104740. - DOI - PMC - PubMed
    1. Mila M, Alvarez-Mora MI, Madrigal I, Rodriguez-Revenga L. Fragile X syndrome: an overview and update of the FMR1 gene. Clin Genet. 2018;93(2):197–205. doi: 10.1111/cge.13075. - DOI - PubMed

Publication types