Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 15;18(7):e1010328.
doi: 10.1371/journal.pcbi.1010328. eCollection 2022 Jul.

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data

Affiliations

Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data

Long Liu et al. PLoS Comput Biol. .

Abstract

Building an accurate disease risk prediction model is an essential step in the modern quest for precision medicine. While high-dimensional genomic data provides valuable data resources for the investigations of disease risk, their huge amount of noise and complex relationships between predictors and outcomes have brought tremendous analytical challenges. Deep learning model is the state-of-the-art methods for many prediction tasks, and it is a promising framework for the analysis of genomic data. However, deep learning models generally suffer from the curse of dimensionality and the lack of biological interpretability, both of which have greatly limited their applications. In this work, we have developed a deep neural network (DNN) based prediction modeling framework. We first proposed a group-wise feature importance score for feature selection, where genes harboring genetic variants with both linear and non-linear effects are efficiently detected. We then designed an explainable transfer-learning based DNN method, which can directly incorporate information from feature selection and accurately capture complex predictive effects. The proposed DNN-framework is biologically interpretable, as it is built based on the selected predictive genes. It is also computationally efficient and can be applied to genome-wide data. Through extensive simulations and real data analyses, we have demonstrated that our proposed method can not only efficiently detect predictive features, but also accurately predict disease risk, as compared to many existing methods.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. An illustrative figure of the architecture of the proposed transfer-learning-based deep network.
The blue box: DNN models obtained from feature screening and the corresponding parameters are fixed. The green box: the background node (h0) capturing the infinitesimal effects and the newly added hidden layers designed to model the joint effects from selected genes. The parameters associated with the background node and the newly added hidden layers are estimated.
Fig 2
Fig 2. An illustrative figure of the architecture of the proposed transfer-learning-based deep network, where no interaction between genes is assumed.
The blue box: DNN models obtained from feature screening and the corresponding parameters are fixed. The green box: the newly added hidden layers, a background node, and their associated parameters that need to be estimated.
Fig 3
Fig 3. The comparisons of power under 5% significance level based on 5000 Monte Carlo simulations.
Linear (90%): 90% of genetic variants on the causal gene is predictive. Linear (10%): 10% of genetic variants on the causal gene is predictive. Interaction: pairwise interaction effects. Non-linear (cos): genetic variants on the causal gene affect the outcome through a cosine function.
Fig 4
Fig 4. The comparisons of prediction accuracy for continuous outcomes.
Genes with p-values less than 0.001 are considered significant.
Fig 5
Fig 5. The comparisons of prediction accuracy for binary outcomes.
Genes with p-values less than 0.001 are considered significant.
Fig 6
Fig 6. The Manhattan plot for AV45 and FDG using the DNN-screen method.
Fig 7
Fig 7. The Pearson correlations between the predicted and observed values for AV45 and FDG.
Genes are pre-selected under the p-value threshold of 0.001 for DNN-transfer, SKAT-linear, SKAT-optimal and ACAT.

Similar articles

Cited by

References

    1. Ashley EA. The precision medicine initiative: a new national effort. JAMA. 2015;313(21):2119–20. doi: 10.1001/jama.2015.3595 - DOI - PubMed
    1. Kim H, Grueneberg A, Vazquez AI, Hsu S, de Los Campos G. Will big data close the missing heritability gap? Genetics. 2017;207(3):1135–1145. doi: 10.1534/genetics.117.300271 - DOI - PMC - PubMed
    1. Nolte IM, van der Most PJ, Alizadeh BZ, de Bakker PI, Boezen HM, Bruinenberg M, et al.. Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study. Eur J Hum Genet. 2017;25(7):877–885. doi: 10.1038/ejhg.2017.50 - DOI - PMC - PubMed
    1. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882–90. doi: 10.1093/aje/kwh101 - DOI - PubMed
    1. Dudbridge F. Polygenic epidemiology. Genet Epidemiol. 2016;40(4):268–72. doi: 10.1002/gepi.21966 - DOI - PMC - PubMed

Publication types