Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct;35(10):951-959.
doi: 10.1038/nbt.3966. Epub 2017 Sep 11.

Analysis of somatic microsatellite indels identifies driver events in human tumors

Affiliations

Analysis of somatic microsatellite indels identifies driver events in human tumors

Yosef E Maruvka et al. Nat Biotechnol. 2017 Oct.

Abstract

Microsatellites (MSs) are tracts of variable-length repeats of short DNA motifs that exhibit high rates of mutation in the form of insertions or deletions (indels) of the repeated motif. Despite their prevalence, the contribution of somatic MS indels to cancer has been largely unexplored, owing to difficulties in detecting them in short-read sequencing data. Here we present two tools: MSMuTect, for accurate detection of somatic MS indels, and MSMutSig, for identification of genes containing MS indels at a higher frequency than expected by chance. Applying MSMuTect to whole-exome data from 6,747 human tumors representing 20 tumor types, we identified >1,000 previously undescribed MS indels in cancer genes. Additionally, we demonstrate that the number and pattern of MS indels can accurately distinguish microsatellite-stable tumors from tumors with microsatellite instability, thus potentially improving classification of clinically relevant subgroups. Finally, we identified seven MS indel driver hotspots: four in known cancer genes (ACVR2A, RNF43, JAK1, and MSH3) and three in genes not previously implicated as cancer drivers (ESRP1, PRDM2, and DOCK3).

PubMed Disclaimer

Conflict of interest statement

Competing Interests; The Broad Institute has filed a patent application regarding the analysis of somatic microsatellite indels in cancers, as reported in this publication.

Figures

Figure 1:
Figure 1:
Identifying somatic indels in microsatellites (MS indels) – schematic description of MSMuTect. A. All reads containing an MS region and sufficient 3’ and 5’ flanking sequence are aligned to a collection of all MS loci and the number of reads supporting each MS length are tallied to create a histogram of observed read lengths per locus. B. The length histograms for all sites that share the same underlying motif and number of repeats (i.e., sites with the same motif and mode length) from the X chromosome of male normal samples were combined into a single histogram. This combined histogram represents the empirical noise distribution (i.e., the probability that a true allele with i repeats will generate a read with j repeats). C. The maximum likelihood method and empirical noise distribution are used to identify the set of alleles that best describes the histogram for a given locus. This set includes the number of alleles, the length of each allele, and the fraction of DNA molecules representing each allele in the sample. After determining the most likely allele for both the tumor and normal sample, somatic MS indels are nominated when the tumor model fits the tumor data better than the normal model fits the tumor data and vice versa (Online Methods).
Figure 2:
Figure 2:
Distribution of MS indels across 6,747 tumors from 20 tumor types. Red horizontal lines represent the median fraction of MS indels in each tumor type. Fig. S6 shows a comparison with the SNV distributions for each tumor type.
Figure 3.
Figure 3.
Differences in mutation patterns and MS indel characteristics between microsatellite unstable (MSI) and microsatellite stable (MSS) tumors. A. Distribution of A motif MS indels across clinical microsatellite (MS) subgroups (MS stable [MSS]; high MS instability [MSI-H]; and low MS instability [MSI-L]) in the three TCGA tumor types for which clinical MSI status was reported (colon adenocarcinoma [COAD], stomach adenocarcinoma [STAD], and uterine corpus endometrial carcinoma [UCEC]). Tumors with ≥15% of SNVs attributed to MS mutations (MSI-SNVs; Online Methods) are plotted in red and tumors with <15% MSI-SNVs are shown in blue. Similarly, tumors with ≥15% SNVs attributed to POLE-mediated mutagenesis (POLE-SNVs) are denoted with an ‘x’ (Online Methods). B. Mean (and standard deviation) relative MS indel frequencies across quintiles of replication times calculated for MSI-H and MSS tumors (combined from the COAD, STAD and UCEC cohorts). Correlation between MS indel frequency and replication timing – not significant in MSS tumors (slope = −0.03, Pearson correlation = −0.47, P=0.43, t-test), weak but significant negative correlation in MSI tumors (slope = −0.1 Pearson correlation = −0.995, P<3×10−4, t-test). C. MS indel frequency as a function of MS length are shown for MSS and MSI-H tumors. In both MSS and MSI-H tumors, the mutation frequency increases with increasing MS length. The increase is more rapid in MSI-H tumors based on the ratio of mutation frequency of MSI-H to MSS tumors across MS loci lengths (inset). D. Log 10 of the frequencies of MS insertions and deletions as a function of normal and mutated repeat number. The estimated number of MS repeats in the normal sample (y-axis) vs the the change in the number of repeats in the tumor (x-axis). The frequency of each specific event (i.e., an insertion or deletion of a given length) is based on the fraction of the total number of covered loci across all samples. MSI-H samples (upper panel), MSS samples (lower panel), and summaried data across all alleles (middle panel). MSI-H samples have more deletions while MSS samples have more insertions (p-value <10−31, χ2 test). Only MS loci with ≥ 5 repeats in both the normal and mutated samples were included.
Figure 4:
Figure 4:
Transcriptional effects of the ESRP1 p.K511fs MS indel mutation. A. ESRP1 expression levels are significantly lower in ESRP1 mutant (p.K511fs) versus wild type (WT) MSI tumors from the UCEC cohort (P<1.5×10−9, Mann-Whitney test). B. The ratio of FGFR2 isoform IIIc to IIIb is significantly higher (p-value <10−7, Mann-Whitney test) in ESRP1 mutant tumors compared to WT tumors. Increased ratio of FGFR2 isoform IIIc to IIIb is associated with epithelial to mesenchymal transition.
Figure 5:
Figure 5:
Location of ACVR2A MS indel mutations in MSI-H stomach adenocarcinoma (STAD) samples. The MS indel hotspot p.K437fs was identified in 52 of 69 cases (MSMutSig q=2.4×10−7) and had not been previously identified in these samples

References

    1. Ellegren H Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet 5, 435–445 (2004). - PubMed
    1. Sun JX et al. A direct characterization of human mutation based on microsatellites. Nat. Genet 44, 1161–1165 (2012). - PMC - PubMed
    1. Pearson CE, Edamura KN & Cleary JD Repeat instability: mechanisms of dynamic mutations. Nat. Rev. Genet 6, 729–742 (2005). - PubMed
    1. Kennedy L et al. Dramatic tissue-specific mutation length increases are an early molecular event in Huntington disease pathogenesis. Hum. Mol. Genet 12, 3359–3367 (2003). - PubMed
    1. Willemsen R, Levenga J & Oostra BA CGG repeat in the FMR1 gene: size matters. Clin. Genet 80, 214–225 (2011). - PMC - PubMed