Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 May;79(3):102-13.
doi: 10.1016/j.tpb.2011.01.002. Epub 2011 Jan 26.

Inference on the strength of balancing selection for epistatically interacting loci

Affiliations

Inference on the strength of balancing selection for epistatically interacting loci

Erkan Ozge Buzbas et al. Theor Popul Biol. 2011 May.

Abstract

Existing inference methods for estimating the strength of balancing selection in multi-locus genotypes rely on the assumption that there are no epistatic interactions between loci. Complex systems in which balancing selection is prevalent, such as sets of human immune system genes, are known to contain components that interact epistatically. Therefore, current methods may not produce reliable inference on the strength of selection at these loci. In this paper, we address this problem by presenting statistical methods that can account for epistatic interactions in making inference about balancing selection. A theoretical result due to Fearnhead (2006) is used to build a multi-locus Wright-Fisher model of balancing selection, allowing for epistatic interactions among loci. Antagonistic and synergistic types of interactions are examined. The joint posterior distribution of the selection and mutation parameters is sampled by Markov chain Monte Carlo methods, and the plausibility of models is assessed via Bayes factors. As a component of the inference process, an algorithm to generate multi-locus allele frequencies under balancing selection models with epistasis is also presented. Recent evidence on interactions among a set of human immune system genes is introduced as a motivating biological system for the epistatic model, and data on these genes are used to demonstrate the methods.

PubMed Disclaimer

Figures

Figure .1
Figure .1
A transmission in a 3-locus Wright-Fisher model with symmetric balancing selection and epistasis between loci from generation t to t + 1. Each population has two distinct alleles. The population of alleles at a locus is denoted by colored balls within a square. In Steps 1A, 1B, and 1C, single-locus genotypes are sampled, with heterozygotes having a selective advantage s = σ/(2N) over homozygotes. In step 2, the 3-locus genotype is assigned a fitness by epistatic function g(2Ns, L), where L is the number of homozygotes (2 in this case shown, from loci B and C). In step 3, an allele is randomly sampled with equal probability within each locus, independent of other loci (gamete formation). In step 4, the chosen allele is subjected to mutation at each locus, with locus-specific mutation rate ui/ki where ui = θi/(4N). In this example, there are two mutational events (at locus B from black to yellow and at locus C from green to gray). The fitness of the 3-locus genotype in step 2 would be that of a one heterozygote (cyan-orange) and two homozygotes (black-black and green-green) if the loci were independent. If there is epistasis, depending on the form of function g, the fitness will be lower of higher.
Figure .2
Figure .2
Epistasis as a function of the number of homozygotes, , in a multi-locus genotype (m = 5, σ = 10). The plot shows linear and quadratic forms of the function g(σ, ), describing epistasis corresponding to antagonistic interaction (cyan), independence (black) and synergistic interaction (red).
Figure .3
Figure .3
Estimated coefficients of variation (the ratio of the standard deviation of the selection parameter estimates to their mean) for three models: antagonistic (cyan), independence (black) and synergistic (red). The parameters of the simulation are σ = 27, θi = 3, and ki = 5 for m = {4, 6, 8, 12, 15, 20}.
Figure .4
Figure .4
Kernel density estimates (using Gaussian Kernel) of posterior samples of σ under antagonistic (cyan), independence (black) and synergistic (red) models for the HLA/KIR data. The 95% HPD intervals (obtained from the original sample without density estimation) are {29, 143} for antagonistic epistasis, {24, 115} for independence, and {20, 98} for synergistic epistasis (100,000 MCMC iterations with thinning at every 100th step).
Figure .5
Figure .5
Estimated probability of having homozygotes in a multi-locus genotype, E^[P[L=x]], for a range of σ values with m = 4 loci. The results are obtained by simulations (106 replicates for each σ). For the homozygote advantage case (σ << 0) we have E^[P[L=mx]]>E^[P[L=m1x]]>>E^[P[L=0x]], whereas for the strong heterozygote advantage case (σ >> 0) the inequalities are reversed. We exploit the structure in E^[P[L=x]] to find the optimal σind to calculate the normalizing constant of equation 13 (see Approximating the normalizing constant and Appendix A).
Figure .6
Figure .6
Normalizing constants on a log scale for antagonistic (cyan), independence (black) and synergistic (red) models for a range of σ values, as obtained by an adjusted Monte Carlo method using equation 14 and Appendix A. The effect of choosing an appropriate σind to minimize the adjustment by the estimate of E[eσepi(x)σind(x)] in equation 13 can be seen by comparing the difference between the constant obtained by choosing the optimal σind and choosing σepi = σind. For example, for the constant desired with σsyn = 32, the corresponding optimal value for σind(x) is 51.3. If σind = 32 were used, the estimate of the expected value E[eσepi(x)σind(x)] in the approximation of c(θ, σ) would have to adjust for a large discrepancy (approximately 1 on the log scale). Similarly, for the constant desired with σant = 80, the optimum is σind = 32, and if σind = 80 were used, the adjustment by the expectation in the estimate would be approximately 2 (log scale). Choosing the optimal value of σind minimizes the effect of the adjustment.

References

    1. Andrieu C, Roberts GO. The pseudo-marginal approach for efficient Monte Carlo computations. Annals of Statistics. 2009;37(2):697–725.
    1. Beaumont MA. Estimation of population growth or decline in genetically monitored populations. Genetics. 2003;164:1139–1160. - PMC - PubMed
    1. Buzbas EO, Joyce P. Maximum likelihood estimates under k-allele models with selection can be numerically unstable. Annals of Applied Statistics. 2009;3:1147–1162.
    1. Buzbas EO, Joyce P, Abdo Z. Estimation of selection intensity under overdominance by Bayesian methods. Statistical Applications in Genetics and Molecular Biology. 2009;8(1) Article 32. - PMC - PubMed
    1. Caffo BS, Booth JG, Davison AC. Empirical supremum rejection sampling. Biometrika. 2002;89(4):745–754.

Publication types

LinkOut - more resources