Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images
- PMID: 40601464
- DOI: 10.1109/JBHI.2025.3584802
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images
Abstract
Zero-shot learning (ZSL) offers tremendous potential for histopathology image analysis, enabling models to generalize to unseen classes without extensive labeled data. Recent vision-language model (VLM) advancements have expanded ZSL capabilities, allowing task performance without task-specific fine-tuning. However, applying VLMs to histopathology presents considerable challenges due to the complexity of histopathological imagery and the nuanced nature of diagnostic tasks. We propose Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE), a novel framework for zero-shot histopathology image classification. MR-PHE mimics pathologists' workflow through multiresolution patch extraction to capture key cellular and tissue features. It introduces a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings, effectively combining local and global contextual information. Additionally, we develop a comprehensive prompt generation and selection framework, enriching class descriptions with domain-specific synonyms and clinically relevant features to enhance semantic understanding. A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings, emphasizing diagnostically important regions during classification. Experimental results demonstrate MR-PHE significantly improves zero-shot classification performance on histopathology datasets, often surpassing fully supervised models, showing its effectiveness and potential to advance computational pathology.
MeSH terms
LinkOut - more resources
Full Text Sources
