Targeting neurodegeneration: three machine learning methods for G9a inhibitors discovery using PubChem and scikit-learn
- PMID: 40767991
- DOI: 10.1007/s10822-025-00642-z
Targeting neurodegeneration: three machine learning methods for G9a inhibitors discovery using PubChem and scikit-learn
Abstract
In light of the increasing interest in G9a's role in neuroscience, three machine learning (ML) models, that are time efficient and cost effective, were developed to support researchers in this area. The models are based on data provided by PubChem and performed by algorithms interpreted by the scikit-learn Python-based ML library. The first ML model aimed to predict the efficacy magnitude of active G9a inhibitors. The ML models were trained with 3112 and tested with 778 samples. The Gradient Boosting Regressor perform the best, achieving 17.81% means relative error, 21.48% mean absolute error, 27.39% root mean squared error and 0.02 coefficient of determination (R2) error. The goal of the second ML model, called a CID_SID ML model, utilised PubChem identifiers to predict the G9a inhibition of a small biomolecule that has been primarily designed for different purposes. The ML models were trained with 58,552 samples and tested with 14,000. The most suitable classifier for this case study was the Extreme Gradient Boosting Classifier, which obtained 79.7% accuracy, 83.2% precision,67.7% recall, 74.7% F1-score and 78.4% ROC. Up to date, this methodology has been used in seven studies, achieving a mean accuracy of 82.75%, precision of 90.71%, Recall of 73.01%, F1-score of 80.79% and ROC of 80.63% across all case studies. The third ML model utilised IUPAC names. It was based on the Random Forest Classifier algorithm, trained with 19,455 samples and tested with 14,100. The probability of this prediction was 68.2% accuracy. Its feature importance list was reordered by the relative proportion of active cases in which they participate. Thus, "iodide" was identified as the one with the highest relative proportion of the active cases to all cases where this fragment participated. In addition, 'iodo' was identified as the most desirable fragment, and "phenylcarbamate" as the least desirable based on their participation only in active or inactive cases, respectively. The computational approach has been initially developed and demonstrated using a case study on Tyrosyl-DNA phosphodiesterase 1(TDP 1) inhibition.
Keywords: CID_SID ML model; G9a inhibitor efficacy; IUPAC based ML model.
© 2025. The Author(s), under exclusive licence to Springer Nature Switzerland AG.
Conflict of interest statement
Declarations. Conflict of interest: The authors declare no conflict of interest.
Similar articles
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12. Clin Orthop Relat Res. 2024. PMID: 37306629 Free PMC article.
-
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.Comput Methods Programs Biomed. 2025 Sep;269:108899. doi: 10.1016/j.cmpb.2025.108899. Epub 2025 Jun 21. Comput Methods Programs Biomed. 2025. PMID: 40570739
-
The Black Book of Psychotropic Dosing and Monitoring.Psychopharmacol Bull. 2024 Jul 8;54(3):8-59. Psychopharmacol Bull. 2024. PMID: 38993656 Free PMC article. Review.
References
-
- Adamu A, Li S, Gao F, Xue G (2024) The role of neuroinflammation in neurodegenerative diseases: current understanding and future therapeutic targets. Front Aging Neurosci 16(4):1347987. https://doi.org/10.3389/fnagi.2024.1347987 - DOI - PubMed - PMC
-
- Akbar S, Ullah M, Raza A, Zou Q, Alghamdi W (2024) DeepAIPs-Pred: predicting anti-inflammatory peptides using local evolutionary transformation images and structural embedding-based optimal descriptors with self-normalized BiTCNs. J Chem Inf Model 64(24):9609–9625. https://doi.org/10.1021/acs.jcim.4c01758 - DOI - PubMed
-
- Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. ArXiv. https://doi.org/10.48550/arXiv.1907.10902
-
- Alles SRA, Smith PA (2021) Peripheral voltage-gated cation channels in neuropathic pain and their potential as therapeutic targets. Front Pain Res 2(12):750583. https://doi.org/10.3389/fpain.2021.750583 - DOI
-
- Bellver-Sanchis A, Ávila-López P, Tic I, Valle-García D, Ribalta-Vilella M, Griñán-Ferré C et al (2024) Neuroprotective effects of G9a inhibition through modulation of peroxisome-proliferator activator receptor gamma-dependent pathways by miR-128. Neural Regen Res 19(11):2532–2542. https://doi.org/10.4103/1673-5374.393102 - DOI - PubMed - PMC
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical