Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Mar 6:9:144.
doi: 10.1186/1471-2105-9-144.

Empirical Bayes analysis of single nucleotide polymorphisms

Affiliations

Empirical Bayes analysis of single nucleotide polymorphisms

Holger Schwender et al. BMC Bioinformatics. .

Abstract

Background: An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs) is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of thousands of SNPs, methods are required that can cope with the corresponding multiple testing problem. For the analysis of gene expression data, approaches such as the empirical Bayes analysis of microarrays have been developed particularly for the detection of genes associated with the response. However, the empirical Bayes analysis of microarrays has only been suggested for binary responses when considering expression values, i.e. continuous predictors.

Results: In this paper, we propose a modification of this empirical Bayes analysis that can be used to analyze high-dimensional categorical SNP data. This approach along with a generalized version of the original empirical Bayes method are available in the R package siggenes version 1.10.0 and later that can be downloaded from http://www.bioconductor.org.

Conclusion: As applications to two subsets of the HapMap data show, the empirical Bayes analysis of microarrays cannot only be used to analyze continuous gene expression data, but also be applied to categorical SNP data, where the response is not restricted to be binary. In association studies in which typically several ten to a few hundred SNPs are considered, our approach can furthermore be employed to test interactions of SNPs. Moreover, the posterior probabilities resulting from the empirical Bayes analysis of (prespecified) interactions/genotypes can also be used to quantify the importance of these interactions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Densities of the test scores in the analyses of the HapMap data. On the left hand side, the histograms and the estimated densities (marked by red lines) of the values of Pearson's χ2-statistic of the SNPs from the two subsets of the HapMap data (upper panel: JPT vs. CHB, lower panel: all four HapMap populations) are shown. The cyan line marks the estimated density when the inner knots are centered around the median in the natural cubic spline used in the density estimation. On the right hand side, the estimated densities (again, marked by red lines) and the corresponding null densities (black lines) are displayed.
Figure 2
Figure 2
Estimating the density of the χ2-distribution. For different degrees of freedom, the true (black line) and the estimated density (red line) of the χ2-distribution are shown, where the density is estimated by applying the procedure of Efron and Tibshirani [19] to 100,000 values randomly drawn from the χ2-distribution. The cyan line marks the estimated density when the inner knots of the natural cubic spline are centered around the median in the df ≥ 3 case.
Figure 3
Figure 3
EBAM analysis of the simulated data. Scatter plots of the posterior probabilities vs. the z-values resulting from the applications of EBAM to both the simulated SNPs themselves (left panel) and the two-way interactions comprised by these SNPs (right panel). Red points mark SNPs or SNP interactions called significant by EBAM, as their posterior probability is larger than or equal to 0.9 (dashed line).
Figure 4
Figure 4
EBAM applied to the genotypes identified by logicFS. Scatter plots of the posterior probabilities vs. the z-values resulting from the applications of EBAM to the genotypes found in an application of logicFS to the simulated data. On the left hand side, the results of the application of EBAM to the data set on which the genotypes are found is shown, whereas on the right hand side, an independent data set is used to test the genotypes. Red points mark SNPs called significant by EBAM using Δ = 0.9 (dashed line).

References

    1. Westfall PH, Young SS. Resampling-based multiple testing: examples and methods for p-value adjustments. New York, NY: Wiley; 1993.
    1. Shaffer JP. Multiple hypothesis testing. Ann Rev Psych. 1995;46:561–584.
    1. Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci. 2003;18:71–103.
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc B. 1995;57:289–300.
    1. Tusher V, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001;98:5116–5124. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources