Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec;20(12):8700-8714.
doi: 10.1002/alz.14319. Epub 2024 Nov 7.

Assessing polyomic risk to predict Alzheimer's disease using a machine learning model

Affiliations

Assessing polyomic risk to predict Alzheimer's disease using a machine learning model

Tiffany Ngai et al. Alzheimers Dement. 2024 Dec.

Abstract

Introduction: Alzheimer's disease (AD) is the most common form of dementia in the elderly. Given that AD neuropathology begins decades before symptoms, there is a dire need for effective screening tools for early detection of AD to facilitate early intervention.

Methods: Here, we used tree-based and deep learning methods to train polyomic prediction models for AD affection status and age at onset, employing genomic, proteomic, metabolomic, and drug use data from UK Biobank. We used SHAP to determine the feature's importance.

Results: Our best-performing polyomic model achieved an area under the receiver operating characteristics curve (AUROC) of 0.87. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides apolipoprotein E (APOE) alleles. Increasing the number of cases by including "AD-by-proxy" cases did not improve AD prediction.

Discussion: Among the four modalities, genomics, and proteomics were the most informative modality based on AUROC (area under the receiver operating characteristic curve). Our data suggest that two blood-based biomarkers (glial fibrillary acidic protein [GFAP] and CXCL17) may be effective for early presymptomatic prediction of AD.

Highlights: We developed a polyomic model to predict AD and age-at-onset using omics and medication use data from EHR. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides APOE alleles. "AD-by-proxy" cases, if used in training, do not improve AD prediction. Proteomics was the most informative modality overall for affection status and AAO prediction.

Keywords: Alzheimer's disease; machine learning; omics; polyomic model; prediction.

PubMed Disclaimer

Conflict of interest statement

All authors declare that they have no potential conflicts of interest related to this work. Author disclosures are available in the Supporting Information.

Figures

FIGURE 1
FIGURE 1
Workflow. Genomic, proteomic, metabolomic, and medication use data were extracted from the UKB and preprocessed. Next, we trained various models to find the best‐performing model trained on the intersection dataset (see also Figure 2). The best machine learning models were used for feature importance evaluation using SHAP. UKB, UK Biobank.
FIGURE 2
FIGURE 2
Data distribution for different modalities using the ICD10 phenotype definition following data preprocessing. The intersection dataset was referred to as the intersection of all four single modalities—genomic, proteomic, metabolomic, and EHR/drug. Single modality datasets were referred to as the entire dataset available for each modality (i.e., the entire circle). EHR, electronic health record; ICD10, International Classification of Diseases, 10th revision.
FIGURE 3
FIGURE 3
Age‐at‐onset prediction results for the intersection dataset using the ICD10 phenotype definition. G, genomics modality; P, proteomics modality; M, metabolomics modality; D, EHR drug modality. A combination of those letters corresponds to multiple modalities used. Prefix “E‐” corresponds to early fusion and the corresponding model. Prefix “L‐” corresponds to late fusion and corresponding meta classifier. The best single modality models are used for late fusion (i.e., genomic, LR; metabolomic, CatBoost; proteomic, LGBM; drug, LGBM). Additional training information and visualization can be found in Table S6 and Figure S5. EHR, electronic health record; ICD10, International Classification of Diseases, 10th revision.
FIGURE 4
FIGURE 4
The feature importance plots for the best early fusion affection status prediction model that fuses all four modalities (LGBM). The left bar plot shows the relative importance (mean SHAP value) of the most important features, and the right beeswarm plot shows the direction in which the feature value is correlated to the prediction. Each dot represents the importance of corresponding features in a positive or negative direction for an individual prediction. The color represents the feature value for that individual.
FIGURE 5
FIGURE 5
The feature importance plots for the best early fusion age at onset prediction model that fuses all four modalities (CatBoost). The left bar plot shows the relative importance of the most important features, and the right beeswarm plot shows the direction in which the feature value is correlated to the prediction. Each dot represents the importance of corresponding features in a positive or negative direction for an individual prediction. The color represents the feature value for that individual.

References

    1. Gustavsson A, Norton N, Fast T, et al. Global estimates on the number of persons across the Alzheimer's disease continuum. Alzheimers Dement. 2023;19:658‐670. - PubMed
    1. GBD 2019 Dementia Forecasting Collaborators . Estimation of the global prevalence of dementia in 2019 and forecasted prevalence in 2050: an analysis for the Global Burden of Disease Study 2019. Lancet Public Health. 2022;7:e105‐e125. - PMC - PubMed
    1. Palmqvist S, Insel PS, Stomrud E, et al. Cerebrospinal fluid and plasma biomarker trajectories with increasing amyloid deposition in Alzheimer's disease. EMBO Mol Med. 2019;11:e11170. - PMC - PubMed
    1. Porsteinsson AP, Isaacson RS, Knox S, Sabbagh MN, Rubino I. Diagnosis of early Alzheimer's disease: clinical practice in 2021. J Prev Alzheimers Dis. 2021;8:371‐386. - PubMed
    1. Tao Q‐Q, Lin R‐R, Wu Z‐Y. Early diagnosis of Alzheimer's disease: moving toward a blood‐based biomarkers era. Clin Interv Aging. 2023;18:353‐358. - PMC - PubMed

LinkOut - more resources