Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct;11(4):661-73.
doi: 10.1093/biostatistics/kxq035. Epub 2010 Jun 3.

On inferring presence of an individual in a mixture: a Bayesian approach

Affiliations

On inferring presence of an individual in a mixture: a Bayesian approach

David Clayton. Biostatistics. 2010 Oct.

Abstract

Homer and others (2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics 4, e1000167) recently showed that, given allele frequency data for a large number of single nucleotide polymorphisms in a sample together with corresponding population "reference" frequencies, by typing an individual's DNA sample at the same set of loci it can be inferred whether or not the individual was a member of the sample. This observation has been responsible for precautionary removal of large amounts of summary data from public access. This and further work on the problem has followed a frequentist approach. This paper sets out a Bayesian analysis of this problem which clarifies the role of the reference frequencies and allows incorporation of prior probabilities of the individual's membership in the sample.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A simulation study. For each of the partial prior knowledge cases shown in Table 1, 1000 simulations were run for genotype (binomial) data under each of H1 (In sample) and H0 (Not in). The box plots show the distribution of values of the Gaussian approximation to log10(Bayes factor).
Fig. 2.
Fig. 2.
Gaussian approximation log Bayes factors versus exact values. This figure shows the results of 20 simulations under each of H0 and H1 for binomial data with N = 100,K = 200 and P = 3450.
Fig. 3.
Fig. 3.
Effect of correlation between variables. This figure shows the results of 100 simulations under each of H0 and H1 for binomial data with N = 100,K = 200 and P = 3450 when the variables are intercorrelated according to an AR(1) model with lag 1 correlations 0.5. (a) No correction for correlation. (b) Correction using estimated inverse correlation matrix from a sample of size 200. (c) Correction using a sample of size 500. (d) Correction using a sample of size 1000.
Fig. 4.
Fig. 4.
A real example using chromosome 20 data drawn from the Wellcome Trust Case Control Consortium (N = 145, P = 4743 and K = 1455). The ordinate in (b) is calculated under the assumption of a nonrepresentative reference sample (FST = 0.003), while that in (c) ignores linkage disequilibrium. Otherwise Bayes factors are calculated on the assumption that the reference sample is representative and allowing for linkage disequilibrium between SNPs.

Similar articles

Cited by

References

    1. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of Statistics. 2003;32:407–499.
    1. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9:432–441. - PMC - PubMed
    1. Heath SC, Gut IG, Brennan P, McKay JD, Bencko V, Fabianova E, Foretova L, Georges M, Janout V, Kabesch M. Investigation of the fine structure of European populations with applications to disease association studies. European Journal of Human Genetics. 2008;16:1413–1429. and others. - PubMed
    1. Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Pearson JV, Stephan DA, Nelson SF, Craig DW. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics. 2008 4, e1000167. - PMC - PubMed
    1. Jacobs KB, Yeager M, Wacholder S, Craig D, Kraft P, Hunter DJ, Paschal J, Manolio TA, Tucker M, Hoover RN. A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies. Nature Genetics. 2009;41:1253–1257. and others. - PMC - PubMed

Publication types