Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2007 Jul;18(6-7):389-401.
doi: 10.1007/s00335-007-9040-6. Epub 2007 Jul 26.

Moving toward a system genetics view of disease

Affiliations
Review

Moving toward a system genetics view of disease

Solveig K Sieberts et al. Mamm Genome. 2007 Jul.

Abstract

Testing hundreds of thousands of DNA markers in human, mouse, and other species for association to complex traits like disease is now a reality. However, information on how variations in DNA impact complex physiologic processes flows through transcriptional and other molecular networks. In other words, DNA variations impact complex diseases through the perturbations they cause to transcriptional and other biological networks, and these molecular phenotypes are intermediate to clinically defined disease. Because it is also now possible to monitor transcript levels in a comprehensive fashion, integrating DNA variation, transcription, and phenotypic data has the potential to enhance identification of the associations between DNA variation and diseases like obesity and diabetes, as well as characterize those parts of the molecular networks that drive these diseases. Toward that end, we review methods for integrating expression quantitative trait loci (eQTLs), gene expression, and clinical data to infer causal relationships among gene expression traits and between expression and clinical traits. We further describe methods to integrate these data in a more comprehensive manner by constructing coexpression gene networks that leverage pairwise gene interaction data to represent more general relationships. To infer gene networks that capture causal information, we describe a Bayesian algorithm that further integrates eQTLs, expression, and clinical phenotype data to reconstruct whole-gene networks capable of representing causal relationships among genes and traits in the network. These emerging network approaches, aimed at processing high-dimensional biological data by integrating data from multiple sources, represent some of the first steps in statistical genetics to identify multiple genetic perturbations that alter the states of molecular networks and that in turn push systems into disease states. Evolving statistical procedures that operate on networks will be critical to extracting information related to complex phenotypes like disease, as research goes beyond a single-gene focus. The early successes achieved with the methods described herein suggest that these more integrative genomics approaches to dissecting disease traits will significantly enhance the identification of key drivers of disease beyond what could be achieved by genetic association studies alone.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
High-level view of the flow of information in biological systems through a hierarchy of networks. Each panel highlights a different set of networks at play in a biological system. Genomics networks represent interactions among DNA sequences that may give rise to longer-range as well as more local chromosome structures that modulate gene activity, in addition to inducing synergistic effects on higher-order phenotypes. Genomics networks drive molecular networks composed of RNA, protein, metabolites, and other molecules in the system. Molecular networks are components of cellular networks in which the complex web of interactions among these networks gives rise to the complex phenotypes that define living systems. Tissue networks comprise cellular networks that are clearly influenced by the molecular and genomics networks, and organism networks comprise tissue networks that are clearly defined by the component cellular and molecular networks. Complex phenotypes like disease emerge from this complex web of interacting networks, given genetic and environmental perturbations to the system
Fig. 2
Fig. 2
Possible relationships between phenotypes with and without genetic information. Edges between nodes in each of the graphs represent an association between the nodes. A directed edge indicates a causal association between the nodes. A A subset of the number of possible relationships between three variables. In the case where one of the three nodes in the network is a DNA locus (red nodes), many of the graphs are no longer possible, given that directed edges from expression trait to DNA locus are not possible. The red Xs highlight edges that would not be allowed if the red node were a DNA locus. B The first three graphs represent the set of possible relationships between two traits and a controlling genetic locus when feedback mechanisms are ignored. The final two graphs represent more complicated scenarios in which multiple genetic loci control a given trait that in turn drives a second trait or a single genetic locus drives multiple traits that collectively drive another trait
Fig. 3
Fig. 3
Mapping proximal and distal eQTLs for gene expression traits. The white rectangles represent genes that are controlled by transcriptional units. The ellipses represent the transcriptional control units, which could be transcription regulatory sites, other genes that control the expression of the indicated gene, and so on. A Cis-acting control unit acting on a gene. DNA variations in this control unit that affected the gene’s expression would lead to a cis-acting (proximal) eQTL. B Cis and trans control units regulating the indicated gene. DNA variations in these control units that affected the gene’s expression would lead to proximal and distal eQTLs. C Cis control unit and multiple trans control units regulating the indicated genes. DNA variations in these control units would lead to a complex eQTL signature for the gene. D A single control unit regulating multiple genes. DNA variations in this single control unit could lead to a cluster of distal eQTLs (an eQTL hot spot)
Fig. 4
Fig. 4
Genes comprising simple linearly ordered pathways operate in a network context. A The classic view of TGF-β signaling (Alberts 2002) involves Tgfbr2 as a key component. Tgfbr2 was recently identified and validated as an obesity susceptibility gene. B The genes comprising the TGF-β signaling pathway are correlated with hundreds of other genes in the liver network (Schadt et al. 2005) so that components of this pathway affect and are affected by many different parts of the network
Fig. 5
Fig. 5
Coexpression and Bayesian networks from adipose expression data collected in a murine F2 intercross population. The upper-left panel is a topological overlap map view of the adipose coexpression network. All pairs of correlations among the 5000 most highly connected genes in the adipose data are plotted in the color matrix display (red indicates positive correlation, blue indicates negative correlation, and white indicates correlation not significant at the p < 10-20 level). The genes are ordered along the x and y axes using an agglomerative hierarchical clustering algorithm. Tightly correlated groups of genes (modules) clearly emerge from this plot. Modules are identified as described in the text. The upper-right panel is the Bayesian network corresponding to genes in module 2 highlighted in the topological overlap map. The lower-left panel represents a subnetwork consisting of 36 genes that contain the genes Lpl and Lactb recently validated as causal for obesity (E. E. Schadt et al., unpublished). More generally, module 2 highlighted in the topological overlap map contains a number of genes validated as causal for obesity (lower-right panel), indicating that disease-causing genes may cluster into functionally coherent sets in the network

Similar articles

Cited by

References

    1. Alberts B (2002) Molecular biology of the cell (New York: Garland Science), p xxxiv
    1. Alberts R, Terpstra P, Bystrykh LV, de Haan G, Jansen RC. A statistical multiprobe model for analyzing cis and trans genes in genetical genomics experiments with short-oligonucleotide arrays. Genetics. 2005;171:1437–1439. - PMC - PubMed
    1. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. - PubMed
    1. Barabasi AL, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5:101–113. - PubMed
    1. Brem RB, Kruglyak L. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc Natl Acad Sci U S A. 2005;102:1572–1577. - PMC - PubMed