Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 21;177(1):38-44.
doi: 10.1016/j.cell.2019.03.004.

Gene-Environment Interaction in the Era of Precision Medicine

Affiliations

Gene-Environment Interaction in the Era of Precision Medicine

Jingjing Li et al. Cell. .

Abstract

Innovative analytical frameworks are required to capture the complex gene-environment interactions. We investigate the insufficiency of commonly used models for disease genome analysis and suggest considering genetic interactions in complex diseases. For non-genetic factors, we study the emerging wearable technologies that have enabled quantification of physiological and environmental factors at an unprecedented breadth and depth. We propose a Bayesian framework to hierarchically model personalized gene-environmental interaction to enable precision health and medicine.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Frameworks for Disease Genome Analysis
(A) A simulation study for identifying the potential source of missing heritability employing a deep neural network. 10,000 polymorphic sites from 1,000 individuals were randomly extracted from a published GWAS (Hong et al., 2017), which were then fed into a pre-configured (with a fixed weight on each edge) deep neural network with varying layers (from N = 1 to N = 10) to model the genotype-phenotype mapping. In particular, N = 1 is a linear regression model taking genotype input across all polymorphic sites (only the input layer and no intermediate layer), and their linear combination generates the output of a phenotype value, thereby representing the conventional additive genetic model. When multiple intermediate layers are involved (N > 1), the aggregated signal will be non-linearly transformed multiple times, as such N = 10 is a deep neural network representing a highly non-linear mapping from genotypes to a phenotypic trait. The network output is the phenotype value for each person that is deterministically derived from one’s genotypes after deep neural network transformation. We configured 5,000 nodes in each intermediate layer with a Sigmoid transfer function. For each N, the simulation was replicated for 100 times by setting network weights based on a standard normal distribution. Details about the simulation experiment. We input the genotypes of each individual into a feedforward neural network defined as: f1 = σ(W1x + b1)(i) f2 = σ(W2f1 + b2)(ii) ... fi = σ(Wifi−1 + bi)(iii) ... fN−1 = σ(WN−1fN−2 + bN−1)(iv) y = WNfN−1 + bN(v) Where N>1 in Equations (i-v), and fi is the output from the i-th layer of the network. When N=1, the above equations degenerate to y = W1x + b. Note that x∈R10000 is the genotype vector; σ is the sigmoid function; Wi and bi (i = 1, …, N) are the weight matrix and bias term for each layer, respectively. Specifically, we defined W1∈R5000×10000 for the input layer, WN∈R1×5000 for the output layer and Wi∈R5000×5000 for the intermediate layers. In the simulation experiment, we varied the number of layers from N = 1 to N = 10 to incrementally increase model complexity, i.e., the degree of non-linearity. For each N, we replicated the simulation 100 times. For each time, we kept the network structure unchanged and only configured a new set of weights and the bias term by sampling from a standard normal distribution independently, i.e., N(0,1). (B) The missed heritability is proportional to systems complexity using the commonly used linear mixture model (implemented by the GCTA toolkit). (C) An example of the PI3K/AKT/mTOR pathway, where most its member proteins are associated with neurodevelopmental diseases. (D) A systems biology strategy using multi-omics data to identify pathways significantly affected by disease-associated pathways. Biological pathways are first constructed using multi-omics profiling techniques in disease-related tissue/cell types, seeded with known disease-associated genes. Genomic mutations were then mapped onto the experimentally derived network to identify a compact sub-network most enriched for patient-specific mutations, which reveals disease-associated pathways.
Figure 2.
Figure 2.. Wearable Sensors for Health Management and Physiological Data Acquisition
(A) An overview of existing wearable sensors acquiring human physiological and environmental data in real time. (B) Early detection of Lyme disease using smart watch. The normalized heart rate (HR) peaked during the Lyme infection event (in red) with a CRP (c-reactive protein) value of 108 MG/L (day 0 to day 4). The infection was confirmed by negative and positive Lyme antibody testing before (day −3) and after the infection event (day 17, when the formal diagnosis was made).
Figure 3.
Figure 3.. A Bayesian View of Gene-Environment Interaction
(A) A simplified example of BADGE illustrates the conditional dependency between genetic and environmental factors that determine the joint probability of disease outcome. According to Equations 1, 2, 3, 4, and 5 in the main text, assuming the risk allele frequency is Pr(G = 1) = 0.15 and the chance to be exposed to the environmental toxicant is Pr(E = 1) = 0.24, with some statistical inference procedures, one can readily determine disease prevalence Pr(D = 1) = 0.18, disease risk with and without environmental exposure at Pr(D = 1|E = 1) = 0.48 and Pr(D = 1|E = 0) = 0.08, respectively. Averaging out environmental factors, risk from personal genomes can be determined by Pr(D = 1|G = 1) = 0.60 and Pr(D = 1|G = 0) = 0.10, respectively, for individuals carrying or not carrying risk alleles. In the same vein, the genetic coefficient for this disease is 0.79 as defined in Equation 5. (B) Graphical representation of the genetic coefficient for a given disease, which is defined by the distributional difference of between the case (red dots) and control genomes (blue dots), quantified by Jensen–Shannon (JS) divergence.

References

    1. Borrie SC, Brems H, Legius E, and Bagni C (2017). Cognitive Dysfunctions in Intellectual Disabilities: The Contributions of the Ras-MAPK and PI3K-AKT-mTOR Pathways. Annu. Rev. Genomics Hum. Genet 18, 115–142. - PubMed
    1. Boyle EA, Li YI, and Pritchard JK (2017). An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186. - PMC - PubMed
    1. Gilman SR, Iossifov I, Levy D, Ronemus M, Wigler M, and Vitkup D (2011). Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron 70, 898–907. - PMC - PubMed
    1. Hall H, Perelman D, Breschi A, Limcaoco P, Kellogg R, McLaughlin T, and Snyder M (2018). Glucotypes reveal new patterns of glucose dysregulation. PLoS Biol. 16, e2005143. - PMC - PubMed
    1. Hong X, Hao K, Ji H, Peng S, Sherwood B, Di Narzo A, Tsai HJ, Liu X, Burd I, Wang G, et al. (2017). Genome-wide approach identifies a novel gene-maternal pre-pregnancy BMI interaction on preterm birth. Nat. Commun 8, 15608. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources