Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct;177(2):861-73.
doi: 10.1534/genetics.107.077263. Epub 2007 Jul 29.

Empirical Bayes inference of pairwise F(ST) and its distribution in the genome

Affiliations

Empirical Bayes inference of pairwise F(ST) and its distribution in the genome

Shuichi Kitada et al. Genetics. 2007 Oct.

Abstract

Populations often have very complex hierarchical structure. Therefore, it is crucial in genetic monitoring and conservation biology to have a reliable estimate of the pattern of population subdivision. F(ST)'s for pairs of sampled localities or subpopulations are crucial statistics for the exploratory analysis of population structures, such as cluster analysis and multidimensional scaling. However, the estimation of F(ST) is not precise enough to reliably estimate the population structure and the extent of heterogeneity. This article proposes an empirical Bayes procedure to estimate locus-specific pairwise F(ST)'s. The posterior mean of the pairwise F(ST) can be interpreted as a shrinkage estimator, which reduces the variance of conventional estimators largely at the expense of a small bias. The global F(ST) of a population generally varies among loci in the genome. Our maximum-likelihood estimates of global F(ST)'s can be used as sufficient statistics to estimate the distribution of F(ST) in the genome. We demonstrate the efficacy and robustness of our model by simulation and by an analysis of the microsatellite allele frequencies of the Pacific herring. The heterogeneity of the global F(ST) in the genome is discussed on the basis of the estimated distribution of the global F(ST) for the herring and examples of human single nucleotide polymorphisms (SNPs).

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
The conventional GST (A) and empirical Bayes (B) estimates of pairwise formula image from 1000 simulations under the infinite-island model at various levels of formula image (0.01, 0.05, 0.1, 0.2) over subpopulations. The mean allele frequencies assumed were formula image with J = 50. The number of sampling localities (K) was set at 50. The sample size Nk/2 (individuals) was common to all localities and was set at 20 individuals.
F<sc>igure</sc> 2.—
Figure 2.—
The conventional informativeness of assignment In (Rosenberg et al. 2003) (A) and empirical Bayes (B) estimates of In from 1000 simulations for the case of formula image The mean allele frequencies assumed were formula image with J = 50. The number of sampling localities (K) was set at 50 and the sample size Nk/2 (individuals) was common to all localities and was set at 20 individuals.
F<sc>igure</sc> 3.—
Figure 3.—
Mean (top left) and root relative mean squared error (top right) of the conventional GST (red circle) and empirical Bayes estimators (blue circle) of pairwise formula image from 1000 simulations under the stepping-stone models. Means and root MSEs were plotted on the true formula image's which fluctuated very slightly when small uniform random variables were added to prevent the points overlapping heavily. The number of subpopulations (K) was set at 15 and the pairwise formula image between two adjacent populations was set at 0.001 (case 1) and 0.0005 (case 2). The sample size Nk/2 (individuals) was common to all localities and set at 20 individuals. Only the results for case 1 are shown. The results of the MDS analysis of two data sets are given in the bottom section; black circles show the true population structure, and the estimated population structure based on the conventional (red “*”) and empirical Bayes estimates (blue “+”) of formula image is shown.
F<sc>igure</sc> 4.—
Figure 4.—
Posterior distributions of formula image for the Pacific herring between Funka Bay and Miyako Bay at each locus, and over all loci, which were averaged over formula image at five loci.
F<sc>igure</sc> 5.—
Figure 5.—
Posterior distributions of formula image for the Pacific herring over all loci, which were averaged over formula image for five loci: AK, Lake Akkeshi; YD, Yudonuma Lake; FK, Funka Bay; OB, Obuchinuma Lake; MY, Miyako Bay; and MT, Matsushima Bay.
F<sc>igure</sc> 6.—
Figure 6.—
Confidence regions of the distribution of formula image in the genome of the Pacific herring, assuming a normal distribution (see the text). (A) The confidence regions of μ and σ2, which specify the distribution. (B) The MLE distribution (red line, the delta distribution) and the representative distributions on the boundary of the confidence regions (a–e), which correspond to the points in A. The distribution of the weighted mean of formula image is superimposed (blue line).
F<sc>igure</sc> 6.—
Figure 6.—
Confidence regions of the distribution of formula image in the genome of the Pacific herring, assuming a normal distribution (see the text). (A) The confidence regions of μ and σ2, which specify the distribution. (B) The MLE distribution (red line, the delta distribution) and the representative distributions on the boundary of the confidence regions (a–e), which correspond to the points in A. The distribution of the weighted mean of formula image is superimposed (blue line).

Similar articles

Cited by

References

    1. Balding, D. J., 2003. Likelihood-based inference for genetic correlation coefficients. Theor. Popul. Biol. 63: 221–230. - PubMed
    1. Balding, D. J., and R. A. Nichols, 1997. Significant genetic correlations among Caucasians at forensic DNA loci. Heredity 78: 583–589. - PubMed
    1. Balloux, F., and N. Lugon-Moulin, 2002. The estimation of population differentiation with microsatellite markers. Mol. Ecol. 11: 155–165. - PubMed
    1. Cockerham, C. C., 1969. Variance of gene frequencies. Evolution 23: 72–83. - PubMed
    1. Cockerham, C. C., 1973. Analysis of gene frequencies. Genetics 74: 679–700. - PMC - PubMed

Publication types