. 2014 Jul;6(7):1843-61.

doi: 10.1093/gbe/evu134.

The effects of microsatellite selection on linked sequence diversity

Ryan J Haasl, Ross C Johnson, Bret A Payseur

PMID: 25115009
PMCID: PMC4122932
DOI: 10.1093/gbe/evu134

The effects of microsatellite selection on linked sequence diversity

Ryan J Haasl et al. Genome Biol Evol. 2014 Jul.

. 2014 Jul;6(7):1843-61.

doi: 10.1093/gbe/evu134.

Authors

Ryan J Haasl, Ross C Johnson, Bret A Payseur

PMID: 25115009
PMCID: PMC4122932
DOI: 10.1093/gbe/evu134

Abstract

The genome-wide scan for selection is an important method for identifying loci involved in adaptive evolution. However, theory that underlies standard scans for selection assumes a simple mutation model. In particular, recurrent mutation of the selective target is not considered. Although this assumption is reasonable for single-nucleotide variants (SNVs), a microsatellite targeted by selection will reliably violate this assumption due to high mutation rate. Moreover, the mutation rate of microsatellites is generally high enough to ensure that recurrent mutation is pervasive rather than occasional. It is therefore unclear if positive selection targeting microsatellites can be detected using standard scanning statistics. Examples of functional variation at microsatellites underscore the significance of understanding the genomic effects of microsatellite selection. Here, we investigate the joint effects of selection and complex mutation on linked sequence diversity, comparing simulations of microsatellite selection and SNV-based selective sweeps. We find that selection on microsatellites is generally difficult to detect using popular summaries of the site frequency spectrum, and, under certain conditions, using popular methods such as the integrated haplotype statistic and SweepFinder. However, comparisons of the number of haplotypes (K) and segregating sites (S) often provide considerable power to detect selection on microsatellites. We apply this knowledge to a scan of autosomes in the human CEU population (CEPH population sampled from Utah). In addition to the most commonly reported targets of selection in European populations, we identify numerous novel genomic regions that bear highly anomalous haplotype configurations. Using one of these regions-intron 1 of MAGI2-as an example, we show that the anomalous configuration is coincident with a perfect CA repeat of length 22. We conclude that standard genome-wide scans will commonly fail to detect mutationally complex targets of selection but that comparisons of K and S will, in many cases, facilitate their identification.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.— — **Fig. 1.—**
The spatial footprint of a hard sweep compared with that of selection on a microsatellite. (A) Tajima’s D summarized across 500 simulations of a hard sweep ( $s = 0.05, h = 0.5$ ) or selection on a microsatellite (additive model, $ϕ = 5$ , $g = - 0.05$ ). D was measured in the generation following fixation of the beneficial SNV (hard sweep) or achievement of mutation–selection equilibrium (microsatellite selection). Purple and black lines mark the mean value of D across 500 simulations of a hard sweep and microsatellite selection, respectively. The 5–95% interquantile range of D is marked by a light purple cloud (hard sweep) or vertical gray bars (microsatellite selection). (B) Results from a single simulation of microsatellite selection (left) and a hard sweep targeting an SNV (right). Points mark the value of D at each nonoverlapping 10-kb window across the simulated 1-Mb sequence. Vertical dashed line indicates the position of the selected SNV or microsatellite. (C) The number of haplotypes K. Colors are the same as in (*A–B*). (D) Same as (C), except only microsatellites with values of $Δ_{msat}$ in the top 10% of all simulations are included.

F<sc>ig</sc>. 2.— — **Fig. 2.—**
Statistical power of statistics that summarize the site frequency spectrum. Power to detect sweeps targeting SNVs is shown in the left column, whereas power to detect scenarios of microsatellite selection is shown in the right column. (A, B) The power of Tajima’s D. (C, D) The power of Fay and Wu’s H_FW. (E, F) The power of Zeng et al.’s E. Time points sampled are as follows: Time 0, the generation before selection begins; 50%, half the time to fixation/equilibrium; 75%, three-quarters the time to fixation/equilibrium; fixation/equilibrium, one generation after fixation or mutation–selection equilibrium; +X, X generations after fixation or mutation–selection equilibrium.

F<sc>ig</sc>. 3.— — **Fig. 3.—**
Statistical power of statistics that summarize the distribution of haplotypes. Power to detect sweeps targeting SNVs is shown in the left column, whereas power to detect scenarios to microsatellite selection is shown in the right column. (A, B) The power of K. (C, D) The power of H. (E, F) The power of M. Time points sampled are the same as in figure 2.

F<sc>ig</sc>. 4.— — **Fig. 4.—**
Changes in haplotype configuration through time. Each panel is labeled with the corresponding selective scenario and proportions illustrated are average proportions across 500 simulations each. The proportions of the sample of the first, second, and third most common haplotypes are shaded in decreasingly dark shades of gray. The proportion of the remaining haplotypes is shaded lightest. Time points sampled are the same as in figures 2 and 3.

F<sc>ig</sc>. 5.— — **Fig. 5.—**
A comparison of $k s k_{(20)}^{2}$ , iHS, and the composite likelihood ratio. We simulated 60 1-Mb sequences under neutral, partial hard sweep (s = 0.05), complete hard sweep (s = 0.05), and microsatellite selection scenarios. Simulated scenarios are indicated above the graph. $k s k_{(20)}^{2}$ values are in black, iHS values are in blue, and composite likelihood ratios are in orange. The dashed black line coincides with the lowest observed value of $k s k_{(20)}^{2}$ among the 40 neutral simulations. The dashed orange line is Bonferroni-corrected significance threshold for the composite likelihood ratio based on 1 million neutral simulations performed in SweepFinder.

F<sc>ig</sc>. 6.— — **Fig. 6.—**
Dissecting the cluster of extreme $k s k_{(20)}^{2}$ values in intron 1 of *MAGI2*. (A) $k s k_{(20)}^{2}$ values in the region of chromosome 7. (B) High-resolution scan of a portion of the region in (A), where a dramatic decrease in $k s k_{(1)}^{2}$ coincides with a perfect CA repeat of length 22; each point is for a 10-kb window stepping forward 1 kb at a time. (C) The haplotype network of the 10-kb window with the most extreme value of $k s k_{(20)}^{2}$ in (B). Numbers in nodes are the number of chromosomes bearing a haplotype (out of 170), whereas numbers along vertices are the number of differences between a pair of connected haplotypes.

See this image and copyright information in PMC

Cited by

Systematic Profiling of Short Tandem Repeats in the Cattle Genome.
Xu L, Haasl RJ, Sun J, Zhou Y, Bickhart DM, Li J, Song J, Sonstegard TS, Van Tassell CP, Lewin HA, Liu GE. Xu L, et al. Genome Biol Evol. 2017 Jan 1;9(1):20-31. doi: 10.1093/gbe/evw256. Genome Biol Evol. 2017. PMID: 28172841 Free PMC article.
Integrating Diverse Types of Genomic Data to Identify Genes that Underlie Adverse Pregnancy Phenotypes.
Hirbo J, Eidem H, Rokas A, Abbot P. Hirbo J, et al. PLoS One. 2015 Dec 7;10(12):e0144155. doi: 10.1371/journal.pone.0144155. eCollection 2015. PLoS One. 2015. PMID: 26641094 Free PMC article.
Conserved microsatellites may contribute to stem-loop structures in 5', 3' terminals of Ebolavirus genomes.
Li D, Zhang H, Peng S, Pan S, Tan Z. Li D, et al. Biochem Biophys Res Commun. 2019 Jun 30;514(3):726-733. doi: 10.1016/j.bbrc.2019.04.192. Epub 2019 May 8. Biochem Biophys Res Commun. 2019. PMID: 31078274 Free PMC article.
Global abundance of short tandem repeats is non-random in rodents and primates.
Arabfard M, Salesi M, Nourian YH, Arabipour I, Maddi AA, Kavousi K, Ohadi M. Arabfard M, et al. BMC Genom Data. 2022 Nov 3;23(1):77. doi: 10.1186/s12863-022-01092-4. BMC Genom Data. 2022. PMID: 36329409 Free PMC article.
Fifteen years of genomewide scans for selection: trends, lessons and unaddressed genetic sources of complication.
Haasl RJ, Payseur BA. Haasl RJ, et al. Mol Ecol. 2016 Jan;25(1):5-23. doi: 10.1111/mec.13339. Epub 2015 Sep 16. Mol Ecol. 2016. PMID: 26224644 Free PMC article. Review.

See all "Cited by" articles

References

1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
1. Akey JM. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 2009;19:711–722. - PMC - PubMed
1. Alachiotis N, Stamatakis A, Pavlidis P. OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets. Bioinformatics. 2012;28:2274–2275. - PubMed
1. Baranovskaya S, et al. Down-regulation of epidermal growth factor receptor by selective expansion of a 5'-end regulatory dinucleotide repeat in colon cancer with microsatellite instability. Clin Cancer Res. 2009;15:4531–4537. - PMC - PubMed
1. Bhargava A, Fuentes FF. Mutational dynamics of microsatellites. Mol Biotechnol. 2010;44:250–266. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The effects of microsatellite selection on linked sequence diversity

The effects of microsatellite selection on linked sequence diversity

Authors

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources