Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 5;14(1):30294.
doi: 10.1038/s41598-024-82208-x.

AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data

Affiliations

AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data

Rohit Shukla et al. Sci Rep. .

Abstract

AD is a progressive neurodegenerative disorder characterized by memory loss. Due to the advancement in next-generation sequencing, an enormous amount of AD-associated genomics data is available. However, the information about the involvement of these genes in AD association is still a research topic. Therefore, AlzGenPred is developed to identify the AD-associated genes using machine-learning. A total of 13,504 features derived from eight sequence-encoding schemes were generated and evaluated using 16 machine learning algorithms. Network-based features significantly outperformed sequence-based features, effectively distinguishing AD-associated genes. In contrast, sequence-based features failed to classify accurately. To improve performance, we generated 24 fused features (6020 D) from sequence-based encodings, increasing accuracy by 5-7% using a two-step lightGBM-based recursive feature selection method. However, accuracy remained below 70% even after hyperparameter tuning. Therefore, network-based features were used to generate the CatBoost-based ML method AlzGenPred with 96.55% accuracy and 98.99% AUROC. The developed method is tested on the AlzGene dataset where it showed 96.43% accuracy. Then the model was validated using the transcriptomics dataset. AlzGenPred provides a reliable and user-friendly tool for identifying potential AD biomarkers, accelerating biomarker discovery, and advancing our understanding of AD. It is available at https://www.bioinfoindia.org/alzgenpred/ and https://github.com/shuklarohit815/AlzGenPred .

Keywords: Alzheimer’s disease; CatBoost; Machine learning; Network features; Neurofibrillary tangles.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The comprehensive methodology of the work with all steps and their respective details.
Fig. 2
Fig. 2
Distribution of the features using t-SNE. (A) AAC, (B) CKSAAP (C) DPC (D) DDE (E) TPC (F) CSKAAGP (G) GAAC (H) GDPC (I) GTPC (J) Geary (K) Moran (L) NMBroto (M) CTDC (N) CTDD (O) CTDT (P) CTriad (Q) KSCTriad (R) QSOrder (S) SOCNumber (T) PAAC (U) APAAC (V) Network features.
Fig. 3
Fig. 3
Accuracy of 22 features from 16 ML algorithms. (A) Training accuracy (B) Test accuracy.
Fig. 4
Fig. 4
Performance of network-based features using tree-based and ensemble classifier methods.
Fig. 5
Fig. 5
The accuracy (in %) of the 4D feature set from 16 ML methods.
Fig. 6
Fig. 6
The fused features accuracy was obtained from 16 ML methods. (A) Training data (B) Test data.
Fig. 7
Fig. 7
Fused features accuracy (%) obtained by the hyperparameter tuning. (A) Training set (B) Test set.
Fig. 8
Fig. 8
Accuracy comparison between tuned models vs. non-tuned modes. (A) training (B) test.
Fig. 9
Fig. 9
ROC curve for three final methods selected after rigorous exercise of feature selection hyperparameter tuning and model generations.
Fig. 10
Fig. 10
Validation of the AlzGenPred tool from different independent datasets. (A) AlzGene dataset (B) GSE113437 (C) GSE67333 and (D) GSE162873.
Fig. 11
Fig. 11
The GUI of the AlzGenPred tool with detailed help.

Similar articles

References

    1. Schneider, L. Alzheimer’s disease and other dementias: Update on research. Lancet Neurol.16, 4–5 (2017). - PubMed
    1. Weuve, J., Hebert, L. E., Scherr, P. A. & Evans, D. A. Prevalence of Alzheimer disease in US states. Epidemiology26, e4–6 (2015). - PubMed
    1. Zeng, H. M., Han, H. B., Zhang, Q. F. & Bai, H. Application of modern neuroimaging technology in the diagnosis and study of Alzheimer’s disease. Neural Regen. Res.16, 73–79 (2021). - PMC - PubMed
    1. d’Abramo, C., D’Adamio, L. & Giliberto, L. Significance of blood and cerebrospinal fluid biomarkers for Alzheimer’s disease: Sensitivity, specificity and potential for clinical use. J. Pers. Med.10, E116 (2020). - PMC - PubMed
    1. McKhann, G. M. et al. The diagnosis of dementia due to Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement.7, 263–269 (2011). - PMC - PubMed

LinkOut - more resources