Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 30;20(12):e1011524.
doi: 10.1371/journal.pgen.1011524. eCollection 2024 Dec.

Fitness landscapes of human microsatellites

Affiliations

Fitness landscapes of human microsatellites

Ryan J Haasl et al. PLoS Genet. .

Abstract

Advances in DNA sequencing technology and computation now enable genome-wide scans for natural selection to be conducted on unprecedented scales. By examining patterns of sequence variation among individuals, biologists are identifying genes and variants that affect fitness. Despite this progress, most population genetic methods for characterizing selection assume that variants mutate in a simple manner and at a low rate. Because these assumptions are violated by repetitive sequences, selection remains uncharacterized for an appreciable percentage of the genome. To meet this challenge, we focus on microsatellites, repetitive variants that mutate orders of magnitude faster than single nucleotide variants, can harbor substantial variation, and are known to influence biological function in some cases. We introduce four general models of natural selection that are each characterized by just two parameters, are easily simulated, and are specifically designed for microsatellites. Using a random forests approach to approximate Bayesian computation, we fit these models to carefully chosen microsatellites genotyped in 200 humans from a diverse collection of eight populations. Altogether, we reconstruct detailed fitness landscapes for 43 microsatellites we classify as targets of selection. Microsatellite fitness surfaces are diverse, including a range of selection strengths, contributions from dominance, and variation in the number and size of optimal alleles. Microsatellites that are subject to selection include loci known to cause trinucleotide expansion disorders and modulate gene expression, as well as intergenic loci with no obvious function. The heterogeneity in fitness landscapes we report suggests that genome-scale analyses like those used to assess selection targeting single nucleotide variants run the risk of oversimplifying the evolutionary dynamics of microsatellites. Moreover, our fitness landscapes provide a valuable visualization of the selective dynamics navigated by microsatellites.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Marginal fitness vs. allele size.
(A) Single-optimum, with key allele size of α = 12 and s = 0.005 (black circles) or s = 0.0005 (blue circles). (B) Periodic-optima using the same parameter values, with marginal fitness of 1.0 for multiples of the key allele size– 12, 24, 36, 48, etc.
Fig 2
Fig 2. Genotypic fitness surfaces for the four models of natural selection simulated.
Parameter values used to generate these graphs were α = 12 and s = 0.005. (A) additive, single-optimum (ASO) with a single, most-fit genotype of 12/12. (B) dominant, single-optimum (DSO) where all genotypes 12/_ have a relative fitness of 1.0. (C) additive, periodic-optima (APO). (D) dominant, periodic-optima (DPO). Compare the relative fitness scales of the ASO (A) and DSO (B) models to those of APO (C) and DPO (D); the relative fitness of the least-fit genotype on the surface is substantially lower under the single-optimum models.
Fig 3
Fig 3
(A) Model choice using the ABC-RF classification algorithm. A separate RF model was trained for each motif size based on 500,000 simulations. Output for a single locus is an array of posterior probabilities for each evolutionary model. We chose the model with the greatest posterior probability. (B) Parameter estimation using the ABC-RF regression algorithm. A separate RF model was trained for each motif size and each model of natural selection based on 100,000 simulations. The output for a single locus is the approximated posterior density of the parameters.
Fig 4
Fig 4
(A) Sample-wide variance in allele size (VAS) vs. mean allele size (MAS) for 54 intergenic dinucleotide microsatellites. In general, genetic variance at microsatellites increases with allele size, despite clear outliers with anomalously high variance. (B) Mean allele size of the same 54 intergenic dinucleotide microsatellites. Loci, from left to right along the x-axis, are ordered by increasing variance in allele size. Each locus is represented by eight dots, which are the means of each sampled population for that locus.
Fig 5
Fig 5
Inferred fitness surfaces for six genic microsatellites and one intergenic microsatellite in human (A-H). Note that each locus has a different scale of relative fitness (wi,j). Examples of loci following the additive single-optimum (B, gene GRIN2B; E, gene SLC11A1), dominant periodic-optima (A, gene TBP; C, gene COL1A2; H, intergenic locus on chromosome 4), and additive periodic-optima (D, gene RBM5; F, gene DPT; G, gene HRC) models are shown. Panel I shows a simpler mapping of genotypes to fitness for a hypothetical A/G SNP in which the adaptive A allele is either additive or dominant. White, gray, and black colors represent high, middle, and low relative fitness, respectively.
Fig 6
Fig 6. Estimated dominant, periodic-optima fitness surface for a dinucleotide microsatellite in the first intron of gene COL1A2.
Most genotypes contain a 20x or 25x allele in accordance with the estimated α = 5 in a periodic optimum model. The most prominent exceptions to this pattern are in the African YRI and LWK populations.
Fig 7
Fig 7. Estimated additive, single-optimum fitness surface for a dinucleotide microsatellite in the basal promoter of SLC11A1.
Note the large allele sizes (22x, 23x, and 24x), which would suggest high mutability and predict a wider distribution of allele sizes than observed.

Similar articles

References

    1. Ronald J, Akey JM. Genome-wide scans for loci under selection in humans. Hum Genomics. 2005;2: 113–125. doi: 10.1186/1479-7364-2-2-113 - DOI - PMC - PubMed
    1. Oleksyk TK, Smith MW, O’Brien SJ. Genome-wide scans for footprints of natural selection. Philosophical Transactions of the Royal Society B: Biological Sciences. 2010;365: 185–205. doi: 10.1098/rstb.2009.0219 - DOI - PMC - PubMed
    1. Vitti JJ, Grossman SR, Sabeti PC. Detecting natural selection in genomic data. Annu Rev Genet. 2013;47: 97–120. doi: 10.1146/annurev-genet-111212-133526 - DOI - PubMed
    1. Haasl RJ, Payseur BA. Fifteen years of genomewide scans for selection: trends, lessons and unaddressed genetic sources of complication. Mol Ecol. 2016;25: 5–23. doi: 10.1111/mec.13339 - DOI - PMC - PubMed
    1. Booker TR, Jackson BC, Keightley PD. Detecting positive selection in the genome. BMC Biol. 2017;15: 98. doi: 10.1186/s12915-017-0434-y - DOI - PMC - PubMed

LinkOut - more resources