This is a preprint.
Undersampling techniques for non-linear chemical space visualization
- PMID: 40672189
- PMCID: PMC12265540
- DOI: 10.1101/2025.07.03.663077
Undersampling techniques for non-linear chemical space visualization
Abstract
The visualization of high-dimensional chemical space is a critical tool for understanding molecular diversity, structure-property relationships, and for guiding compound selection. However, the performance of non-linear dimensionality reduction (DR) techniques like t-Stochastic Neighborhood Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) are often susceptible to the choice of hyperparameters, along with the high cost of their training for large datasets. In this study, we investigated the effect of undersampling methods on the choice of hyperparameter selection for these non-linear dimensionality reduction methods. Our results demonstrate that selecting small representative subsets of chemical data not only reduces computational costs associated with hyperparameter training but also serves as an innovative means to train non-linear DR methods, leading to projections that better preserve the local structure within the chemical space.
Figures




Similar articles
-
Sexual Harassment and Prevention Training.2024 Mar 29. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2024 Mar 29. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 36508513 Free Books & Documents.
-
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948. Health Technol Assess. 2024. PMID: 39367772 Free PMC article.
-
Benchmarking of dimensionality reduction methods to capture drug response in transcriptome data.Sci Rep. 2025 Sep 1;15(1):32173. doi: 10.1038/s41598-025-12021-7. Sci Rep. 2025. PMID: 40890192 Free PMC article.
-
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320. Health Technol Assess. 2001. PMID: 12065068
-
Omega-3 fatty acids for depression in adults.Cochrane Database Syst Rev. 2015 Nov 5;2015(11):CD004692. doi: 10.1002/14651858.CD004692.pub4. Cochrane Database Syst Rev. 2015. Update in: Cochrane Database Syst Rev. 2021 Nov 24;11:CD004692. doi: 10.1002/14651858.CD004692.pub5. PMID: 26537796 Free PMC article. Updated.
References
-
- Lipinski C.; Hopkins A. Navigating chemical space for biology and medicine. Nature 2004, 432, 855–861. - PubMed
-
- Reymond J.-L. The chemical space project. Accounts of chemical research 2015, 48, 722–730. - PubMed
-
- Reymond J.-L.; Ruddigkeit L.; Blum L.; Van Deursen R. The enumeration of chemical space. Wiley Interdisciplinary Reviews: Computational Molecular Science 2012, 2, 717–733.
-
- Reymond J.-L.; Van Deursen R.; Blum L. C.; Ruddigkeit L. Chemical space as a source for new drugs. MedChemComm 2010, 1, 30–38.
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources