Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images
- PMID: 40601464
- DOI: 10.1109/JBHI.2025.3584802
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images
Abstract
Zero-shot learning (ZSL) offers tremendous potential for histopathology image analysis, enabling models to generalize to unseen classes without extensive labeled data. Recent vision-language model (VLM) advancements have expanded ZSL capabilities, allowing task performance without task-specific fine-tuning. However, applying VLMs to histopathology presents considerable challenges due to the complexity of histopathological imagery and the nuanced nature of diagnostic tasks. We propose Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE), a novel framework for zero-shot histopathology image classification. MR-PHE mimics pathologists' workflow through multiresolution patch extraction to capture key cellular and tissue features. It introduces a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings, effectively combining local and global contextual information. Additionally, we develop a comprehensive prompt generation and selection framework, enriching class descriptions with domain-specific synonyms and clinically relevant features to enhance semantic understanding. A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings, emphasizing diagnostically important regions during classification. Experimental results demonstrate MR-PHE significantly improves zero-shot classification performance on histopathology datasets, often surpassing fully supervised models, showing its effectiveness and potential to advance computational pathology.
Similar articles
-
Mixture of prompts learning for vision-language models.Front Artif Intell. 2025 Jun 10;8:1580973. doi: 10.3389/frai.2025.1580973. eCollection 2025. Front Artif Intell. 2025. PMID: 40556640 Free PMC article.
-
Enhancing waste recognition with vision-language models: A prompt engineering approach for a scalable solution.Waste Manag. 2025 Aug 1;204:114939. doi: 10.1016/j.wasman.2025.114939. Epub 2025 Jun 12. Waste Manag. 2025. PMID: 40513414
-
Watch and learn: leveraging expert knowledge and language for surgical video understanding.Int J Comput Assist Radiol Surg. 2025 Jul 2. doi: 10.1007/s11548-025-03472-4. Online ahead of print. Int J Comput Assist Radiol Surg. 2025. PMID: 40601123
-
Intraoperative frozen section analysis for the diagnosis of early stage ovarian cancer in suspicious pelvic masses.Cochrane Database Syst Rev. 2016 Mar 1;3(3):CD010360. doi: 10.1002/14651858.CD010360.pub2. Cochrane Database Syst Rev. 2016. PMID: 26930463 Free PMC article.
-
Patching for corneal abrasion.Cochrane Database Syst Rev. 2016 Jul 26;7(7):CD004764. doi: 10.1002/14651858.CD004764.pub3. Cochrane Database Syst Rev. 2016. PMID: 27457359 Free PMC article.
LinkOut - more resources
Full Text Sources