Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 2;6(12):e1001014.
doi: 10.1371/journal.pcbi.1001014.

Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations

Affiliations

Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations

Benjamin A Logsdon et al. PLoS Comput Biol. .

Abstract

Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Example of biological relationships that can be reconstructed by the algorithm.
An expression Quantitative Trait Locus (eQTL) directly alters the expression level of Gene A, a relationship that we represent in our network model with the parameter formula image. This gene in turn has an effect on Gene B through an unobserved pathway represented by the ‘Factors’ node. While these factors are unobserved we can still infer that there is a regulatory effect of Gene A on the downstream Gene B, which is represented in our network model by the parameter formula image.
Figure 2
Figure 2. Example of a graphical model equivalence class when determining regulatory relationships among four genes ().
Edges represent the direction of regulation. In this case, the true regulatory network connecting the four genes (blue) has the same sampling distribution as the other three incorrect models (red). Without perturbations (i.e. eQTL), each of these models will equivalently describe the pattern of expression observed among these genes for any data-set.
Figure 3
Figure 3. Outline of the structure of Step 2 of the algorithm.
(a) After selection of phenotypes in Step 1, we produce a covariance matrix between observed gene expression products, and their associated unique cis-eQTL. (b) A convex feature selection method (the adaptive lasso) is used to learn the structure of the inverse covariance matrix, which is also the conditional independence or interaction network among gene expression products and cis-eQTL genotypes. (c) The directed cyclic network among expression products can then be recovered directly from the conditional independence network, using the “Recovery” Theorem. For Step 3, each of the induced edges between expression phenotypes and cis-eQTL, shown in (b), are tested to ensure marginal independence using a permutation test.
Figure 4
Figure 4. Examples of four network topologies used to simulate gene expression data from 160 total topologies.
Sparse acyclic (a), dense acyclic (b), and dense cyclic (c) graphs were simulated for networks with 10 genes. Intermediately dense cyclic networks were simulated networks with 30 genes (d). Nodes represent expression levels of genes and the directed edges represent regulatory (conditional) relationships among genes, where the strength of the relationships were determined by sampling from a uniform distribution. Each phenotype (node) has a unique, independent cis-eQTL feeding into into it (not shown), with constant effect.
Figure 5
Figure 5. Performance of our algorithm using the adaptive lasso for directed acyclic graphs compared to other algorithms.
These other algorithms include the PC-algorithm, the QDG algorithm, and the QTLnet algorithm for reconstructing different acyclic topologies of 10 genes. For a sparse directed acyclic topology (as in Figure 4a), the power (a) and false discovery rate (b) are plotted as a function of the sample size for five replicate simulations. Similarly, for a dense directed acyclic topology (as in Figure 4b), the power (c) and false discovery rate (d) are plotted.
Figure 6
Figure 6. Performance of our algorithm using the adaptive lasso for directed cyclic graphs compared to other algorithms.
These other algorithms include the PC-algorithm, the QDG algorithm, and the QTLnet algorithm for reconstructing different cyclic topologies of 10 genes (a) and (b) or 30 genes (c) and (d). For a dense directed cyclic topology (as in Figure 4c), the power (a) and false discovery rate (b) are plotted as a function of the sample size for five replicate simulations. Similarly, for an intermediately dense directed cyclic topology of 30 genes (as in Figure 4d), the power (c) and false discovery rate (d) are plotted.
Figure 7
Figure 7. Sparse network reconstruction among 35 gene expression products.
These genes were filtered for having strong, independent cis-eQTL (pairwise formula image) using the adaptive lasso algorithm for a Saccharomyces cerevisiae cross between a wild strain and lab strain , with 112 segregants (see text for details). (a) Recovered undirected network among these 35 gene expression products and (b) putative directed network reconstructed for the same genes, based on the edges between cis-eQTL (not shown) and the 35 genes. Bold edges represent directed edges with strong confidence based on a resampling procedure (see text for details).

References

    1. Chen Y, Zhu J, Lum P, Yang X, Pinto S, et al. Variations in DNA elucidate molecular networks that cause disease. Nature. 2008;452:429–435. - PMC - PubMed
    1. Emilsson V, Thorleifsson G, Zhang B, Leonardson A, Zink F, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452:423–428. - PubMed
    1. Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyze expression data. J Comput Biol. 2000;7:601–620. - PubMed
    1. Pe'er D, Regev A, Elidan G, Friedman N. Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001;17:S215. - PubMed
    1. Zhu J, Wiener M, Zhang C, Fridman A, Minch E, et al. Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations. PLoS Comput Biol. 2007;3:e69. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources