AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data
- PMID: 39639110
- PMCID: PMC11621786
- DOI: 10.1038/s41598-024-82208-x
AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data
Abstract
AD is a progressive neurodegenerative disorder characterized by memory loss. Due to the advancement in next-generation sequencing, an enormous amount of AD-associated genomics data is available. However, the information about the involvement of these genes in AD association is still a research topic. Therefore, AlzGenPred is developed to identify the AD-associated genes using machine-learning. A total of 13,504 features derived from eight sequence-encoding schemes were generated and evaluated using 16 machine learning algorithms. Network-based features significantly outperformed sequence-based features, effectively distinguishing AD-associated genes. In contrast, sequence-based features failed to classify accurately. To improve performance, we generated 24 fused features (6020 D) from sequence-based encodings, increasing accuracy by 5-7% using a two-step lightGBM-based recursive feature selection method. However, accuracy remained below 70% even after hyperparameter tuning. Therefore, network-based features were used to generate the CatBoost-based ML method AlzGenPred with 96.55% accuracy and 98.99% AUROC. The developed method is tested on the AlzGene dataset where it showed 96.43% accuracy. Then the model was validated using the transcriptomics dataset. AlzGenPred provides a reliable and user-friendly tool for identifying potential AD biomarkers, accelerating biomarker discovery, and advancing our understanding of AD. It is available at https://www.bioinfoindia.org/alzgenpred/ and https://github.com/shuklarohit815/AlzGenPred .
Keywords: Alzheimer’s disease; CatBoost; Machine learning; Network features; Neurofibrillary tangles.
© 2024. The Author(s).
Conflict of interest statement
Declarations. Competing interests: The authors declare no competing interests.
Figures











Similar articles
-
Deciphering the role of lipid metabolism-related genes in Alzheimer's disease: a machine learning approach integrating Traditional Chinese Medicine.Front Endocrinol (Lausanne). 2024 Oct 23;15:1448119. doi: 10.3389/fendo.2024.1448119. eCollection 2024. Front Endocrinol (Lausanne). 2024. PMID: 39507054 Free PMC article.
-
AITeQ: a machine learning framework for Alzheimer's prediction using a distinctive five-gene signature.Brief Bioinform. 2024 May 23;25(4):bbae291. doi: 10.1093/bib/bbae291. Brief Bioinform. 2024. PMID: 38877887 Free PMC article.
-
Integrating network, sequence and functional features using machine learning approaches towards identification of novel Alzheimer genes.BMC Genomics. 2016 Oct 18;17(1):807. doi: 10.1186/s12864-016-3108-1. BMC Genomics. 2016. PMID: 27756223 Free PMC article.
-
Machine learning approach to gene essentiality prediction: a review.Brief Bioinform. 2021 Sep 2;22(5):bbab128. doi: 10.1093/bib/bbab128. Brief Bioinform. 2021. PMID: 33842944 Review.
-
Omics-based biomarkers discovery for Alzheimer's disease.Cell Mol Life Sci. 2022 Nov 8;79(12):585. doi: 10.1007/s00018-022-04614-6. Cell Mol Life Sci. 2022. PMID: 36348101 Free PMC article. Review.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical