Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun;185(2):587-602.
doi: 10.1534/genetics.109.112391. Epub 2010 Apr 9.

Likelihood-free inference of population structure and local adaptation in a Bayesian hierarchical model

Affiliations

Likelihood-free inference of population structure and local adaptation in a Bayesian hierarchical model

Eric Bazin et al. Genetics. 2010 Jun.

Abstract

We address the problem of finding evidence of natural selection from genetic data, accounting for the confounding effects of demographic history. In the absence of natural selection, gene genealogies should all be sampled from the same underlying distribution, often approximated by a coalescent model. Selection at a particular locus will lead to a modified genealogy, and this motivates a number of recent approaches for detecting the effects of natural selection in the genome as "outliers" under some models. The demographic history of a population affects the sampling distribution of genealogies, and therefore the observed genotypes and the classification of outliers. Since we cannot see genealogies directly, we have to infer them from the observed data under some model of mutation and demography. Thus the accuracy of an outlier-based approach depends to a greater or a lesser extent on the uncertainty about the demographic and mutational model. A natural modeling framework for this type of problem is provided by Bayesian hierarchical models, in which parameters, such as mutation rates and selection coefficients, are allowed to vary across loci. It has proved quite difficult computationally to implement fully probabilistic genealogical models with complex demographies, and this has motivated the development of approximations such as approximate Bayesian computation (ABC). In ABC the data are compressed into summary statistics, and computation of the likelihood function is replaced by simulation of data under the model. In a hierarchical setting one may be interested both in hyperparameters and parameters, and there may be very many of the latter--for example, in a genetic model, these may be parameters describing each of many loci or populations. This poses a problem for ABC in that one then requires summary statistics for each locus, which, if used naively, leads to a consequent difficulty in conditional density estimation. We develop a general method for applying ABC to Bayesian hierarchical models, and we apply it to detect microsatellite loci influenced by local selection. We demonstrate using receiver operating characteristic (ROC) analysis that this approach has comparable performance to a full-likelihood method and outperforms it when mutation rates are variable across loci.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
DAG for the genetic model. See text for details.
F<sc>igure</sc> 2.—
Figure 2.—
Posterior distribution of genome-wide parameters. The data set contains 100 loci and a sample of 100 gene copies taken from six demes. Five loci are under selection. The data are simulated under the last scenario listed in Table 2.
F<sc>igure</sc> 3.—
Figure 3.—
Estimates of the posterior probability for a microsatellite locus to be under selection, P(Zi =1 | U(Xi), S(X)). The first five loci in red are effectively simulated under selection. The other loci in green are neutral. The data are simulated under the last scenario listed in Table 2.
F<sc>igure</sc> 4.—
Figure 4.—
A comparison of ROC curves for the ABC method (red) and BayesFst (blue). The curves are based on average true positive and false positive rates measured on 100 simulated data sets. The data are simulated under the last scenario listed in Table 2 (parameter values are also shown in legend).
F<sc>igure</sc> 5.—
Figure 5.—
A comparison of ROC curves for the ABC method (red) and BayesFst (blue). The mutation rate varies across loci. The data are simulated under the 7th scenario listed in Table 2 (parameter values are also shown in legend). Other details are as in Figure 4.
F<sc>igure</sc> 6.—
Figure 6.—
The precision (1 − false discovery rate) is plotted against the classification cutoff (i.e., posterior probability or 1 − P-value) used in the ABC and BayesFst method. The data from the last scenario listed are used (see also Figure 4).
F<sc>igure</sc> 7.—
Figure 7.—
Marginal posterior distributions of hyperparameters for the chimpanzee data.
F<sc>igure</sc> 8.—
Figure 8.—
The posterior probability that a locus in the chimpanzee data is under selection, under the ABC model. Inset is the result of an analysis with BayesFst.

References

    1. Balding, D. J., 2003. Likelihood-based inference for genetic correlation coefficients. Theor. Popul. Biol. 63 221–230. - PubMed
    1. Balding, D. J., and R. A. Nichols, 1994. DNA profile match probability calculations: how to allow for population stratification, relatedness, database selection and single bands. Forensic Sci. Int. 64 125–140. - PubMed
    1. Barton, N., and B. Bengtsson, 1986. The barrier to genetic exchange between hybridising populations. Heredity 56 357–376. - PubMed
    1. Basu, D., 1977. On the elimination of nuisance parameters. J. Am. Stat. Assoc. 72 355–366.
    1. Beaumont, M., 2008. Selection and sticklebacks. Mol. Ecol. 17 3425–3427. - PubMed

Publication types