Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul;6(7):1843-61.
doi: 10.1093/gbe/evu134.

The effects of microsatellite selection on linked sequence diversity

The effects of microsatellite selection on linked sequence diversity

Ryan J Haasl et al. Genome Biol Evol. 2014 Jul.

Abstract

The genome-wide scan for selection is an important method for identifying loci involved in adaptive evolution. However, theory that underlies standard scans for selection assumes a simple mutation model. In particular, recurrent mutation of the selective target is not considered. Although this assumption is reasonable for single-nucleotide variants (SNVs), a microsatellite targeted by selection will reliably violate this assumption due to high mutation rate. Moreover, the mutation rate of microsatellites is generally high enough to ensure that recurrent mutation is pervasive rather than occasional. It is therefore unclear if positive selection targeting microsatellites can be detected using standard scanning statistics. Examples of functional variation at microsatellites underscore the significance of understanding the genomic effects of microsatellite selection. Here, we investigate the joint effects of selection and complex mutation on linked sequence diversity, comparing simulations of microsatellite selection and SNV-based selective sweeps. We find that selection on microsatellites is generally difficult to detect using popular summaries of the site frequency spectrum, and, under certain conditions, using popular methods such as the integrated haplotype statistic and SweepFinder. However, comparisons of the number of haplotypes (K) and segregating sites (S) often provide considerable power to detect selection on microsatellites. We apply this knowledge to a scan of autosomes in the human CEU population (CEPH population sampled from Utah). In addition to the most commonly reported targets of selection in European populations, we identify numerous novel genomic regions that bear highly anomalous haplotype configurations. Using one of these regions-intron 1 of MAGI2-as an example, we show that the anomalous configuration is coincident with a perfect CA repeat of length 22. We conclude that standard genome-wide scans will commonly fail to detect mutationally complex targets of selection but that comparisons of K and S will, in many cases, facilitate their identification.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.—
Fig. 1.—
The spatial footprint of a hard sweep compared with that of selection on a microsatellite. (A) Tajima’s D summarized across 500 simulations of a hard sweep (s=0.05,h=0.5) or selection on a microsatellite (additive model, ϕ=5, g=0.05). D was measured in the generation following fixation of the beneficial SNV (hard sweep) or achievement of mutation–selection equilibrium (microsatellite selection). Purple and black lines mark the mean value of D across 500 simulations of a hard sweep and microsatellite selection, respectively. The 5–95% interquantile range of D is marked by a light purple cloud (hard sweep) or vertical gray bars (microsatellite selection). (B) Results from a single simulation of microsatellite selection (left) and a hard sweep targeting an SNV (right). Points mark the value of D at each nonoverlapping 10-kb window across the simulated 1-Mb sequence. Vertical dashed line indicates the position of the selected SNV or microsatellite. (C) The number of haplotypes K. Colors are the same as in (A–B). (D) Same as (C), except only microsatellites with values of Δmsat in the top 10% of all simulations are included.
F<sc>ig</sc>. 2.—
Fig. 2.—
Statistical power of statistics that summarize the site frequency spectrum. Power to detect sweeps targeting SNVs is shown in the left column, whereas power to detect scenarios of microsatellite selection is shown in the right column. (A, B) The power of Tajima’s D. (C, D) The power of Fay and Wu’s HFW. (E, F) The power of Zeng et al.’s E. Time points sampled are as follows: Time 0, the generation before selection begins; 50%, half the time to fixation/equilibrium; 75%, three-quarters the time to fixation/equilibrium; fixation/equilibrium, one generation after fixation or mutation–selection equilibrium; +X, X generations after fixation or mutation–selection equilibrium.
F<sc>ig</sc>. 3.—
Fig. 3.—
Statistical power of statistics that summarize the distribution of haplotypes. Power to detect sweeps targeting SNVs is shown in the left column, whereas power to detect scenarios to microsatellite selection is shown in the right column. (A, B) The power of K. (C, D) The power of H. (E, F) The power of M. Time points sampled are the same as in figure 2.
F<sc>ig</sc>. 4.—
Fig. 4.—
Changes in haplotype configuration through time. Each panel is labeled with the corresponding selective scenario and proportions illustrated are average proportions across 500 simulations each. The proportions of the sample of the first, second, and third most common haplotypes are shaded in decreasingly dark shades of gray. The proportion of the remaining haplotypes is shaded lightest. Time points sampled are the same as in figures 2 and 3.
F<sc>ig</sc>. 5.—
Fig. 5.—
A comparison of ksk(20)2, iHS, and the composite likelihood ratio. We simulated 60 1-Mb sequences under neutral, partial hard sweep (s = 0.05), complete hard sweep (s = 0.05), and microsatellite selection scenarios. Simulated scenarios are indicated above the graph. ksk(20)2 values are in black, iHS values are in blue, and composite likelihood ratios are in orange. The dashed black line coincides with the lowest observed value of ksk(20)2 among the 40 neutral simulations. The dashed orange line is Bonferroni-corrected significance threshold for the composite likelihood ratio based on 1 million neutral simulations performed in SweepFinder.
F<sc>ig</sc>. 6.—
Fig. 6.—
Dissecting the cluster of extreme ksk(20)2 values in intron 1 of MAGI2. (A) ksk(20)2 values in the region of chromosome 7. (B) High-resolution scan of a portion of the region in (A), where a dramatic decrease in ksk(1)2 coincides with a perfect CA repeat of length 22; each point is for a 10-kb window stepping forward 1 kb at a time. (C) The haplotype network of the 10-kb window with the most extreme value of ksk(20)2 in (B). Numbers in nodes are the number of chromosomes bearing a haplotype (out of 170), whereas numbers along vertices are the number of differences between a pair of connected haplotypes.

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Akey JM. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 2009;19:711–722. - PMC - PubMed
    1. Alachiotis N, Stamatakis A, Pavlidis P. OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets. Bioinformatics. 2012;28:2274–2275. - PubMed
    1. Baranovskaya S, et al. Down-regulation of epidermal growth factor receptor by selective expansion of a 5'-end regulatory dinucleotide repeat in colon cancer with microsatellite instability. Clin Cancer Res. 2009;15:4531–4537. - PMC - PubMed
    1. Bhargava A, Fuentes FF. Mutational dynamics of microsatellites. Mol Biotechnol. 2010;44:250–266. - PubMed

Publication types

LinkOut - more resources