Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 6;6(5):e1000770.
doi: 10.1371/journal.pcbi.1000770.

A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies

Affiliations

A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies

Oliver Stegle et al. PLoS Comput Biol. .

Abstract

Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. General additive model for sources of gene expression variability.
The formula image matrix formula image of measured gene expression levels of formula image genes from formula image individuals is modelled by additive contributions from components formula image and observation noise formula image. Here, the components capture the signal due to primary effect of the genetic state formula image, known factors formula image and hidden factors formula image. Some examples of possible underlying sources of variation are given above the model boxes. The groupings represent some standard genetic association models commonly used.
Figure 2
Figure 2. Bayesian network and outline of the inference schedule for VBQTL.
(a) The Bayesian network for the model of gene expression variation used in VBQTL (see Methods). The full model combines genetic (green), known factor (blue) and hidden factor (red) models to explain the observed gene expression levels formula image. The solid rectangles indicate that contained variables are duplicated for each gene probe (formula image), SNP (formula image) or factor (formula image) respectively. A similar rectangle for individuals (formula image) is omitted in this representation. The dashed rectangle indicates that the variable formula image switches the contained part of the graph on or off representing the existence or lack of an association. Nodes with thick outlines (formula image, formula image and formula image) are observed. (b)–(e) Update cycle of the known factors model introduced in Section Inference. The red outline highlights the parts of the model that change in a step, and the thick blue arrows illustrate the flow of information. Details of these updates are discussed in the text.
Figure 3
Figure 3. Sensitivity of recovering simulated hidden factor effects and eQTLs for Bayesian and non-Bayesian methods.
(a) Mean-squared error in estimating only the hidden factor contribution. Methods that do not explicitly retain the genetic factors explain them away as hidden global factors, resulting in high error comparable to not accounting for hidden factors at all (Standard). (b) Mean-squared error in estimating the contribution from hidden and genetic factors. (c) Sensitivity of recovering immediate SNP associations. (d) Sensitivity of recovering downstream associations. Seven hidden factors and three transcription factor effects were simulated. For eQTL sensitivity, standard eQTL finding on simulated data (Standard) and same data without the hidden effects (Ideal) are included as comparisons. PCAsig and SVA identified a constant number of hidden components (marked with a diamond shape), thus only a single result (dashed line) is given.
Figure 4
Figure 4. Number of probes with an eQTL found as a function of maximum number of hidden factors for three previously published datasets.
Significance-testing based methods (PCAsig, SVA) identified the same number of factors for a wide range of cutoff values (formula image), thus only a single count is given (dashed lines), together with the number of factors found (diamond shape). Other methods were applied with a maximum number of formula image, formula image, formula image and formula image hidden factors.
Figure 5
Figure 5. Fraction of tested genes with a cis association in individual chromosomes and overall false discovery rate for the HapMap CEU population (FPR = ).
Figure 6
Figure 6. Validation of VBeQTLs by comparison to standard eQTLs.
(a,b,d,e) Venn diagrams depicting overlap of probes with a standard eQTL or VBeQTL in the CEU population and probes with an eQTL in other populations. (c,f) Standard and VBeQTL location and strength relative to the transcription start site.

References

    1. Brem RB, Yvert G, Clinton R, Kruglyak L. Genetic dissection of transcriptional regulation in budding yeast. Science. 2002;296:752–755. - PubMed
    1. Brem RB, Kruglyak L. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc Natl Acad Sci. 2005;102:1572–7. - PMC - PubMed
    1. Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nature genetics. 2005;37:710–7. - PMC - PubMed
    1. Stranger BEE, Nica ACC, Forrest MSS, Dimas A, Bird CPP, et al. Population genomics of human gene expression. Nature genetics. 2007;39:1217. - PMC - PubMed
    1. Spielman RSS, Bastone LAA, Burdick JTT, Morley M, Ewens WJJ, et al. Common genetic variants account for differences in gene expression among ethnic groups. Nature genetics. 2007;200:7. - PMC - PubMed

Publication types

LinkOut - more resources