Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec;216(4):1205-1215.
doi: 10.1534/genetics.120.303780. Epub 2020 Oct 16.

Detecting Selection from Linked Sites Using an F-Model

Affiliations

Detecting Selection from Linked Sites Using an F-Model

Marco Galimberti et al. Genetics. 2020 Dec.

Abstract

Allele frequencies vary across populations and loci, even in the presence of migration. While most differences may be due to genetic drift, divergent selection will further increase differentiation at some loci. Identifying those is key in studying local adaptation, but remains statistically challenging. A particularly elegant way to describe allele frequency differences among populations connected by migration is the F-model, which measures differences in allele frequencies by population specific FST coefficients. This model readily accounts for multiple evolutionary forces by partitioning FST coefficients into locus- and population-specific components reflecting selection and drift, respectively. Here we present an extension of this model to linked loci by means of a hidden Markov model (HMM), which characterizes the effect of selection on linked markers through correlations in the locus specific component along the genome. Using extensive simulations, we show that the statistical power of our method is up to twofold higher than that of previous implementations that assume sites to be independent. We finally evidence selection in the human genome by applying our method to data from the Human Genome Diversity Project (HGDP).

Keywords: Bayesian statistics; F-statistics; balancing selection; divergent selection; hidden Markov model.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Expected proportion of neutral sites as a function of rates μ and ν. (B, C) Example paths of αl along 1000 loci simulated at a distance of dl = 100 with smax = 10 positive and negative states up to αmax = 3.0. Autocorrelation among loci was simulated with log(κ) = −3.0, v = 0.02, and μ = 0.91 (B, square) or μ = 0.74 (C, circle), respectively. The two cases correspond to an expected proportion of 20% and 10% of the genome under selection, as marked in (A).
Figure 2
Figure 2
A directed acyclic graph (DAG) of the proposed model with two hierarchical levels.
Figure 3
Figure 3
Boxplot of the parameters β1 (left), ν and μ (center), and log(κ) (right). The values are obtained from the mean of the posterior distributions obtained using Flink on the 10 simulations run for each of the set of parameters reported in Table 1. The red dotted lines show the true values of the respective parameters.
Figure 4
Figure 4
The true positive rate (power) in classifying loci as neutral (black) or under divergent (orange) or balancing selection (blue) as a function of the FST between populations (A), the number of haplotypes N (B), the number of populations J (C), and the strength of autocorrelation κ (D). Lines indicate the mean and range of true positive rates obtained with Flink (solid) and BayeScan (dashed) across 10 replicate simulations. Filled dots and the vertical gray line indicate the reference simulation shown in each plot.
Figure 5
Figure 5
(A) The fraction of regions identified as divergent among Europeans by Flink (green) and BayescanH (black) at a false discovery rate (FDR) of 0.01 (solid) and 0.05 (dashed) also identified by the other method at different FDR. (B–D) Examples of regions found under divergent selection by Flink (B), BayeScanH (C), or both (D) among Europeans. Dashed lines indicate the 0.01 FDR threshold.
Figure 6
Figure 6
Signal of selection around the LCT gene on Chromosome 2q. The orange and blue lines indicate the locus-specific FDR for divergent (orange) and balancing (blue) selection, respectively. The black dashed line shows the 1% FDR threshold. A zoom of the highlighted region is shown on the right indicating the position of several genes: R3HDM1 (R3), MIR128-1 (MI), UBXN4 (UB), MCM6 (MC), DARS (DA), and DARS-AS1 (DA1). The entire Chromosome 2q is shown in Figure S7.

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium; Auton A., Brooks L. D., Durbin R. M., Garrison E. P. et al. , 2015. A global reference for human genetic variation. Nature 526: 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
    1. Andrew R. L., and Rieseberg L. H., 2013. Divergence is focused on few genomic regions early in speciation: incipient speciation of sunflower ecotypes. Evolution 67: 2468–2482. 10.1111/evo.12106 - DOI - PubMed
    1. Balding D. J., 2003. Likelihood-based inference for genetic correlation coefficients. Theor. Popul. Biol. 63: 221–230. 10.1016/S0040-5809(03)00007-8 - DOI - PubMed
    1. Beaumont M., and Nichols R. A., 1996. Evaluating loci for use in the genetic analysis of population structure. Proc. Biol. Sci. 263: 1619–1626. 10.1098/rspb.1996.0237 - DOI
    1. Beaumont M. A., and Balding D. J., 2004. Identifying adaptive genetic divergence among populations from genome scans. Mol. Ecol. 13: 969–980. 10.1111/j.1365-294X.2004.02125.x - DOI - PubMed

Publication types