. 2022 Jul 15;18(7):e1010328.

doi: 10.1371/journal.pcbi.1010328. eCollection 2022 Jul.

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data

Long Liu¹, Qingyu Meng¹, Cherry Weng², Qing Lu³, Tong Wang¹, Yalu Wen^{1

2}

Affiliations

¹ Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi, China.
² Department of Statistics, University of Auckland, Auckland, New Zealand.
³ Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America.

PMID: 35839250
PMCID: PMC9328574
DOI: 10.1371/journal.pcbi.1010328

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data

Long Liu et al. PLoS Comput Biol. 2022.

. 2022 Jul 15;18(7):e1010328.

doi: 10.1371/journal.pcbi.1010328. eCollection 2022 Jul.

Authors

Long Liu¹, Qingyu Meng¹, Cherry Weng², Qing Lu³, Tong Wang¹, Yalu Wen^{1

2}

Affiliations

¹ Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi, China.
² Department of Statistics, University of Auckland, Auckland, New Zealand.
³ Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America.

PMID: 35839250
PMCID: PMC9328574
DOI: 10.1371/journal.pcbi.1010328

Abstract

Building an accurate disease risk prediction model is an essential step in the modern quest for precision medicine. While high-dimensional genomic data provides valuable data resources for the investigations of disease risk, their huge amount of noise and complex relationships between predictors and outcomes have brought tremendous analytical challenges. Deep learning model is the state-of-the-art methods for many prediction tasks, and it is a promising framework for the analysis of genomic data. However, deep learning models generally suffer from the curse of dimensionality and the lack of biological interpretability, both of which have greatly limited their applications. In this work, we have developed a deep neural network (DNN) based prediction modeling framework. We first proposed a group-wise feature importance score for feature selection, where genes harboring genetic variants with both linear and non-linear effects are efficiently detected. We then designed an explainable transfer-learning based DNN method, which can directly incorporate information from feature selection and accurately capture complex predictive effects. The proposed DNN-framework is biologically interpretable, as it is built based on the selected predictive genes. It is also computationally efficient and can be applied to genome-wide data. Through extensive simulations and real data analyses, we have demonstrated that our proposed method can not only efficiently detect predictive features, but also accurately predict disease risk, as compared to many existing methods.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. An illustrative figure of the architecture of the proposed transfer-learning-based deep network.**
**The blue box**: DNN models obtained from feature screening and the corresponding parameters are fixed. **The green box**: the background node ( $h_{0}^{'}$ ) capturing the infinitesimal effects and the newly added hidden layers designed to model the joint effects from selected genes. The parameters associated with the background node and the newly added hidden layers are estimated.

**Fig 2. An illustrative figure of the architecture of the proposed transfer-learning-based deep network, where no interaction between genes is assumed.**
**The blue box**: DNN models obtained from feature screening and the corresponding parameters are fixed. **The green box**: the newly added hidden layers, a background node, and their associated parameters that need to be estimated.

**Fig 3. The comparisons of power under 5% significance level based on 5000 Monte Carlo simulations.**
Linear (90%): 90% of genetic variants on the causal gene is predictive. Linear (10%): 10% of genetic variants on the causal gene is predictive. Interaction: pairwise interaction effects. Non-linear (cos): genetic variants on the causal gene affect the outcome through a cosine function.

**Fig 4. The comparisons of prediction accuracy for continuous outcomes.**
Genes with p-values less than 0.001 are considered significant.

**Fig 5. The comparisons of prediction accuracy for binary outcomes.**
Genes with p-values less than 0.001 are considered significant.

**Fig 6. The Manhattan plot for AV45 and FDG using the DNN-screen method.**

**Fig 7. The Pearson correlations between the predicted and observed values for AV45 and FDG.**
Genes are pre-selected under the p-value threshold of 0.001 for DNN-transfer, SKAT-linear, SKAT-optimal and ACAT.

See this image and copyright information in PMC

Cited by

Designing interpretable deep learning applications for functional genomics: a quantitative analysis.
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. van Hilten A, et al. Brief Bioinform. 2024 Jul 25;25(5):bbae449. doi: 10.1093/bib/bbae449. Brief Bioinform. 2024. PMID: 39293804 Free PMC article. Review.
Detecting genetic interactions with visible neural networks.
van Hilten A, Melograna F, Fan B, Niessen W, van Steen K, Roshchupkin G. van Hilten A, et al. Commun Biol. 2025 Jun 5;8(1):874. doi: 10.1038/s42003-025-08157-x. Commun Biol. 2025. PMID: 40473911 Free PMC article.
Functional Neural Networks for High-Dimensional Genetic Data Analysis.
Zhang S, Zhou Y, Geng P, Lu Q. Zhang S, et al. IEEE/ACM Trans Comput Biol Bioinform. 2024 May-Jun;21(3):383-393. doi: 10.1109/TCBB.2024.3364614. Epub 2024 Jun 5. IEEE/ACM Trans Comput Biol Bioinform. 2024. PMID: 38507390 Free PMC article.
TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield.
Li J, Zhang D, Yang F, Zhang Q, Pan S, Zhao X, Zhang Q, Han Y, Yang J, Wang K, Zhao C. Li J, et al. Plant Commun. 2024 Jul 8;5(7):100975. doi: 10.1016/j.xplc.2024.100975. Epub 2024 May 15. Plant Commun. 2024. PMID: 38751121 Free PMC article.
Deep learning captures the effect of epistasis in multifactorial diseases.
Perelygin V, Kamelin A, Syzrantsev N, Shaheen L, Kim A, Plotnikov N, Ilinskaya A, Ilinsky V, Rakitko A, Poptsova M. Perelygin V, et al. Front Med (Lausanne). 2025 Jan 7;11:1479717. doi: 10.3389/fmed.2024.1479717. eCollection 2024. Front Med (Lausanne). 2025. PMID: 39839630 Free PMC article.

See all "Cited by" articles

References

1. Ashley EA. The precision medicine initiative: a new national effort. JAMA. 2015;313(21):2119–20. doi: 10.1001/jama.2015.3595 - DOI - PubMed
1. Kim H, Grueneberg A, Vazquez AI, Hsu S, de Los Campos G. Will big data close the missing heritability gap? Genetics. 2017;207(3):1135–1145. doi: 10.1534/genetics.117.300271 - DOI - PMC - PubMed
1. Nolte IM, van der Most PJ, Alizadeh BZ, de Bakker PI, Boezen HM, Bruinenberg M, et al.. Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study. Eur J Hum Genet. 2017;25(7):877–885. doi: 10.1038/ejhg.2017.50 - DOI - PMC - PubMed
1. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882–90. doi: 10.1093/aje/kwh101 - DOI - PubMed
1. Dudbridge F. Polygenic epidemiology. Genet Epidemiol. 2016;40(4):268–72. doi: 10.1002/gepi.21966 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data

Affiliations

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources