Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations

Jun Zhu¹, Matthew C Wiener, Chunsheng Zhang, Arthur Fridman, Eric Minch, Pek Y Lum, Jeffrey R Sachs, Eric E Schadt

Affiliations

PMID: 17432931
PMCID: PMC1851982
DOI: 10.1371/journal.pcbi.0030069

Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations

Jun Zhu et al. PLoS Comput Biol. 2007.

. 2007 Apr 13;3(4):e69.

doi: 10.1371/journal.pcbi.0030069. Epub 2007 Feb 27.

Authors

Jun Zhu¹, Matthew C Wiener, Chunsheng Zhang, Arthur Fridman, Eric Minch, Pek Y Lum, Jeffrey R Sachs, Eric E Schadt

Affiliation

¹ Rosetta Inpharmatics, Seattle, Washington, United States of America.

PMID: 17432931
PMCID: PMC1851982
DOI: 10.1371/journal.pcbi.0030069

Abstract

To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

**Figure 1. The Data Simulation Scheme with Genetic and Network Constraints**
(A) A segregating population (an F2 intercross in this case) is simulated using the QTL Cartographer software suite (Rqtl, Rcross, and Zmapqtl). The QTL model for a trait is defined using the Rqtl program, and the heritability of the QTL is defined using the Rcross program. (B) The traits simulated by Rcross are used as the head nodes in the simulated network. The remaining traits are simulated based on the values of the head nodes according to the DAG structure and the set of conditional probability density functions associated with this structure. (C) After traits for all nodes in the network are simulated, they are scanned for QTLs using the Zmapqtl program. The traits and the associated QTL are then input into the network reconstruction program.

**Figure 2. Reconstruction Accuracy Based on 100-Sample Datasets Generated Using Parameters Similar to BXD Data**
All accuracies are based on directed graphs unless indicated otherwise. (A) Accuracy of reconstructions with and without genetic information used as prior information. (B) Accuracy of reconstructions for the top-layer subnetwork, as defined in the text.

**Figure 3. The Accuracy of Reconstruction of the Synthetic Network, Reconstructed with and without Genetic Information**
The genetics information not only helps to infer the direction of the relationships between nodes (solid lines), but also increases the power to detect relationships when direction is ignored, as with the association networks (dashed lines).

Figure 4. Reconstruction accuracy with the genetic (dotted and solid lines) and without the genetic (dashed lines) information, using varying numbers of samples, and based on an overall genetic signal similar to that found in the BXD network, but with weaker interactions (see text for details)
(A) Reconstruction accuracy for the entire network. (B) Reconstruction accuracy for the subnetwork comprising only the top layer of the network. The dotted lines reflect reconstructions that utilized cis QTL information as the only source of genetic information, whereas the solid lines reflect reconstructions that utilized all available genetic information.

Figure 5. Reconstruction accuracy with the genetic (dotted and solid lines, as described in the Figure 4 legend) and without the genetic (dashed lines) information, using varying numbers of samples, and based on reduced heritability and a weak overall correlation structure compared with what we observed in the BXD network (see text for details)
(A) Accuracies of networks reconstructed with and without genetic information. (B) Accuracies of subnetworks consisting only of those nodes in the top layer of the network. (C) Accuracies of networks in which a true edge was counted as correct if the corresponding nodes were connected either directly or by a path involving two edges in the reconstructed network. It is clear the genetic data significantly enhance reconstruction accuracy.

See this image and copyright information in PMC

References

1. Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet. 2005;37:710–717. - PMC - PubMed
1. Pearl J. Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo (California): Morgan Kaufmann Publishers. p. xix; 1988. 552
1. Pe'er D, Regev A, Elidan G, Friedman N. Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001;17(Supplement 1):S215–S224. - PubMed
1. Sachs K, Perez O, Pe'er D, Lauffenburger DA, Nolan GP. Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005;308:523–529. - PubMed
1. Zhu J, Lum PY, Lamb J, GuhaThakurta D, Edwards SW, et al. An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet Genome Res. 2004;105:363–374. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations

Affiliation

Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources