Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 15;26(12):i149-57.
doi: 10.1093/bioinformatics/btq211.

Inferring combinatorial association logic networks in multimodal genome-wide screens

Affiliations

Inferring combinatorial association logic networks in multimodal genome-wide screens

Jeroen de Ridder et al. Bioinformatics. .

Abstract

Motivation: We propose an efficient method to infer combinatorial association logic networks from multiple genome-wide measurements from the same sample. We demonstrate our method on a genetical genomics dataset, in which we search for Boolean combinations of multiple genetic loci that associate with transcript levels.

Results: Our method provably finds the global solution and is very efficient with runtimes of up to four orders of magnitude faster than the exhaustive search. This enables permutation procedures for determining accurate false positive rates and allows selection of the most parsimonious model. When applied to transcript levels measured in myeloid cells from 24 genotyped recombinant inbred mouse strains, we discovered that nine gene clusters are putatively modulated by a logical combination of trait loci rather than a single locus. A literature survey supports and further elucidates one of these findings. Due to our approach, optimal solutions for multi-locus logic models and accurate estimates of the associated false discovery rates become feasible. Our algorithm, therefore, offers a valuable alternative to approaches employing complex, albeit suboptimal optimization strategies to identify complex models.

Availability: The MATLAB code of the prototype implementation is available on: http://bioinformatics.tudelft.nl/ or http://bioinformatics.nki.nl/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic overview of data and association inference. (A) A panel of BXD mice that is densely genotyped and expression profiled. The genotype data can be considered as binary vectors by choosing a binary encoding of the alleles (in the figure D = true and B = false) and putting thresholds that divide the genome into loci such that each locus differs in at least one element from its neighbors. The cartoon shows that good association is obtained between Locus 5 and Gene 7 because elevated expression is consistently observed in conjunction with the D allele of Locus 5. (B) Interaction among genetic features may destroy direct associations between individual loci and genes. The cartoon shows that configurations exist in which the gene expression can only be predicted by considering two loci simultaneously (using Boolean xor logic). (C) By inferring CAL networks, interaction among genetic features is taken into account in the association inference. Inferring CAL networks is achieved by selecting the input loci with the selection function 𝒮 and combining these with the appropriate Boolean function ℬ, such that the association (as measured by a scoring function f) between the network output and the gene of interest is maximized.
Fig. 2.
Fig. 2.
Association versus approximated association. (A) Example gene expression vector (circles) split in x0 and x1 according to yopt. The magenta line denotes the association measure f, defined in Equation (2), as a function of a threshold t that splits the expression vector in x0 and x1. The blue triangles indicate the error weights w(τ) that result after optimizing them. (B and C) 500 random samples that are generated by introducing up to seven bit-flips in yopt to show the relation between formula image and f. The red dot indicates formula image and f values for yopt. (B) shows the samples in case the weights are assumed equal. Although the trend of the data is monotonically increasing, a large spread around this trend is observed. (C) shows the same samples in case the weights are optimized, resulting in a near one-to-one relation between formula image and f.
Fig. 3.
Fig. 3.
Computation of solution sets for each sample. (A) Example data from Figure 1A. (B) The topology and the truth table of the Boolean function ℬ under investigation. (C) Explanation by example of the calculation of V(τ), the set of all possible input combinations to ℬ such that yopt(τ) = y(τ). This panel shows how V(1) is determined. Since yopt(1) = 1, the rows from the truth table for which y = 1 are applicable, i.e, r = {2, 4, 6, 7}. According to r = 2, the desired output for τ = 1 is obtained by selecting any of the loci that are ‘0’ for inputs i1 and i2, and loci that are ‘1’ for input i3. Accordingly, for i1 we may select from the set: {l1, l2, l4}. This can be efficiently calculated by taking the xnor (evaluates to ‘1’ when both inputs are equal) between row τ = 1 from the data matrix and the row r = 2 from the truth table, as shown in (C). Observe that the result is an efficient encoding of all the possible input combinations that satisfy yopt(1) while using r = 2 from the truth table. In general, we denote this set by Vr(τ), and its binary encoding by formula image. To determine the complete set of valid input combinations for τ = 1, rows 4, 6 and 7 need to be considered in a similar fashion. V(1) is now determined by taking the union of the subsets, i.e. V(1) = V2(1)V4(1)V6(1)V7(1), which, in binary form, may be represented by a concatenation of formula image, formula image, formula image and formula image. (D) This panel shows the valid input combinations for τ = 1 and τ = 3 in binary representation (i.e. formula image and formula image). For any set of samples C the input combinations for which the output equals yopt can be obtained by taking the intersection of the individual sets. In binary representation, this is equivalent to taking the row-wise cartesian product (row-wise product of all combinations of rows), as is shown in the panel.
Fig. 4.
Fig. 4.
Algorithm performance in terms of accuracy and runtime under various conditions. (A) Bargraph displaying accuracy for different network topologies and different values of the f-score. For each of the network topologies the 75th percentile of the solution distribution is also given, showing that for solutions in the tail 100% accuracy is obtained. For the two missing bars in the 4-6 and 5-6 bins no solutions were found. (B) and C) Runtimes for different network topologies and dataset sizes. The horizontal lines reflect runtimes for exhaustive search. From bottom to top these represent the runtimes for: a single input network, two input network and three input network with one, two and four times the number of predictors, respectively.
Fig. 5.
Fig. 5.
(A) Bargraph with an overview of the number of gene clusters for which a significant (10% FDR) solution is found. Network topologies are sorted according to the 10% FDR level (blue line). (B) CAL networks significant at 10% FDR. The color and shape of the symbols correspond to the symbols used in (C). Small circles at the inputs of the networks denote negation, i.e. for these inputs the mapping from allele to binary representation is switched. We also indicate whether the best single marker coincides, for that gene cluster, with one of the inputs of the CAL network. (C) Marker/probe-plot for the top CAL networks showing both the eQTLs (blue crosses) and ceQTLs (sets of colored symbols of various shapes). The colors and shapes of the markers refer to the network topologies listed in (B). Horizontal gray lines connect the inputs and the output of the CAL network. Because probes were clustered, it occurs that the ceQTLs map to multiple probes in case these probes were part of the same cluster. The numeric labels near the the colored symbols correspond to the input of the network. Notably, some probes seem to be predicted by more ceQTLs than there are inputs to the CAL network reported. This occurs when there are multiple combinations of markers that show the same association with the gene expression level of the network output, and can be explained by similarity among markers. The cis-band (diagonal) is clearly visible, and in one occasion contains a ceQTL. Overlap among ceQTLs from different networks is marked by red dashed lines, overlap between ceQTLs and eQTLs by black dashed lines.
Fig. 6.
Fig. 6.
Input regions of the CAL network for Lilrb4 The line graphs give the f-score for association between the output gene and the individual markers (blue) and the network output (red). The latter was computed by taking the maximum f-score of the network using the marker under evaluation for one input and any of the other markers for the second input of the network. Where possible the IDs of the genetic markers are given, but some were omitted for readability. The dot plots gives the expression values separated by network output (right) and the best markers in the inputs (left). Finally, for one particular combination of markers the genotype for all strains is depicted as a Boolean heat map. In these diagrams, the not gates were already incorporated.

Similar articles

Cited by

References

    1. Biederer T, et al. Regulation of APP-dependent transcription complexes by mint/x11s: differential functions of Mint isoforms. J. Neurosci. 2002;22:7340–7351. - PMC - PubMed
    1. Bystrykh LV, et al. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat. Genet. 2005;37:225–232. - PubMed
    1. Calderwood DA, et al. Integrin beta cytoplasmic domain interactions with phosphotyrosine-binding domains: a structural prototype for diversity in integrin signaling. Proc. Natl Acad. Sci. USA. 2003;100:2272–2277. - PMC - PubMed
    1. Castells MC, et al. gp49b1-alpha(v)beta3 interaction inhibits antigen-induced mast cell activation. Nat. Immunol. 2001;2:436–442. - PubMed
    1. Cleveland W. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 1979;74:829–836.

Publication types