Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 15;25(2):468-485.
doi: 10.1093/biostatistics/kxac051.

DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies

Affiliations

DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies

Ruoyu He et al. Biostatistics. .

Abstract

Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.

Keywords: 2SLS; DeepIV; Instrumental variable (IV); Nonparametric IV regression; SNP; TWAS.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A diagram illustrating the true causal model (A) with all the three valid IV assumptions satisfied or (B) with the exclusion and exchangeability assumptions violated.
Figure 2
Figure 2
Estimates of formula image given by DeLIVR left) and DeepIV (right): the dash-dotted line is the true formula image; the dashed line is the average of formula image over 100 runs; the shaded area is the empirical point wise formula image CI of formula image.
Figure 3
Figure 3
Venn diagrams (left) and Q–Q plots (right) for the numbers of the significant genes for HDL (top) and LDL (bottom); the results of DeLIVR were given by the Cauchy Combination Test over 21 repeated runs for each gene.}

References

    1. Abadi, M., Agarwal, A.,Barham, P.,Brevdo, E.,Chen, Z.,Citro, C.,Corrado, G. S., Davis, A.,Dean, J.,Devin, M.. and others. (2015). TensorFlow: large-scale machine learning on heterogeneous systems. 12th USENIX symposium on operating systems design and implementation (OSDI 16), 265–283. Software available from tensorflow.org.
    1. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, C1–C68.
    1. De Vries, P. S., Brown, M. R.,Bentley, A. R.,Sung, Y.J.,Winkler, T.W.,Ntalla, I.,Schwander, K.,Kraja, A.T.,Guo, X.,Franceschini, N.. and others. (2019). Multiancestry genome-wide association study of lipid levels incorporating gene-alcohol interactions. American Journal of Epidemiology 188, 1033–1054. - PMC - PubMed
    1. Deng, Y. and Pan, W. (2021). Model checking via testing for direct effects in Mendelian randomization and transcriptome-wide association studies. PLoS Computational Biology 17, e1009266. - PMC - PubMed
    1. Gamazon, E. R., Wheeler, H. E., Shah, K. P., Mozaffari, S. V., Aquino-Michaels, K., Carroll, R. J., Eyler, A. E., Denny, J. C., GTEx Consortium, Nicolae, D. L., Cox, N. J. and Im, H. K. (2015). A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics 47, 1091–1098. - PMC - PubMed