Boosting data interpretation with GIBOOST to enhance visualization of complex high-dimensional data
- PMID: 40843971
- PMCID: PMC12371410
- DOI: 10.1093/bib/bbaf415
Boosting data interpretation with GIBOOST to enhance visualization of complex high-dimensional data
Abstract
High-dimensional single-cell data analysis is crucial for understanding complex biological interactions, yet conventional dimensionality reduction methods (DRMs) often fail to preserve both global and local structures. Existing DRMs, such as t-distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), Principal Component Analysis (PCA), and Potential of Heat-diffusion for Affinity-based Transition Embedding (PHATE), optimize different visualization objectives, resulting in trade-offs between cluster separability, spatial organization, and temporal coherence. To overcome these limitations, we introduce GIBOOST, an AI-driven framework that integrates outputs from multiple DRMs using a Bayesian framework and an optimized autoencoder. GIBOOST systematically selects and integrates the two most informative DRMs by evaluating key visualization features, including separability, spatial continuity, uniformity, cellular dynamics, and cluster sensitivity. Rather than prioritizing a single DRM, it identifies the optimal combination that maximizes clustering sensitivity (GI) while preserving biologically relevant spatial and temporal structures. This integration is further refined through a GI-optimized autoencoder, which optimizes the joint distribution of GI, neuron count, and batch size effects to improve visualization quality. We demonstrate GIBOOST's efficacy across multiple dynamic biological processes, including epithelial-mesenchymal transition, CiPSC reprogramming, spermatogenesis, and placental development. Compared to nine individual DRMs, GIBOOST enhances clustering sensitivity and biological relevance by ~30%, enabling more accurate interpretation of differentiation trajectories and cell-cell interactions. When applied to a large single-cell RNA-seq dataset (~400 000 cells, 28 cell types, seven placental regions), GIBOOST uncovers novel immune-placenta interactions, providing deeper insights into cross-tissue communication during pregnancy. By improving both the visualization and interpretability of high-dimensional data, GIBOOST serves as a powerful tool for computational systems biology, enabling a more accurate exploration of complex cellular systems.
Keywords: AI-driven data integration; cell–cell communication; data visualization; dimensionality reduction; immune-placental interactions; single-cell analysis.
Published by Oxford University Press 2025.
References
-
- Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9:2579–2605.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
