. 2025 Aug 12:8:1527980.

doi: 10.3389/frai.2025.1527980. eCollection 2025.

MedAlmighty: enhancing disease diagnosis with large vision model distillation

Yajing Ren¹, Zheng Gu¹, Wen Liu¹

Affiliations

PMID: 40873493
PMCID: PMC12378157
DOI: 10.3389/frai.2025.1527980

MedAlmighty: enhancing disease diagnosis with large vision model distillation

Yajing Ren et al. Front Artif Intell. 2025.

. 2025 Aug 12:8:1527980.

doi: 10.3389/frai.2025.1527980. eCollection 2025.

Authors

Yajing Ren¹, Zheng Gu¹, Wen Liu¹

Affiliation

¹ Artificial Intelligence and Smart Mine Engineering Technology Center, Xinjiang Institute of Engineering, Urumqi, China.

PMID: 40873493
PMCID: PMC12378157
DOI: 10.3389/frai.2025.1527980

Abstract

Introduction: Accurate disease diagnosis is critical in the medical field, yet it remains a challenging task due to the limited, heterogeneous, and complex nature of medical data. These challenges are particularly pronounced in multimodal tasks requiring the integration of diverse data sources. While lightweight models offer computational efficiency, they often lack the comprehensive understanding necessary for reliable clinical predictions. Conversely, large vision models, trained on extensive general-domain datasets, provide strong generalization but fall short in specialized medical applications due to domain mismatch and limited medical data availability.

Methods: To bridge the gap between general and specialized performance, we propose MedAlmighty, a knowledge distillation-based framework that synergizes the strengths of both large and small models. In this approach, we utilize DINOv2-a pre-trained large vision model-as a frozen teacher, and a lightweight convolutional neural network (CNN) as the trainable student. The student model is trained using both hard labels from the ground truth and soft targets generated by the teacher model. We adopt a hybrid loss function that combines cross-entropy loss (for classification accuracy) and Kullback-Leibler divergence (for distillation), enabling the student model to capture rich semantic features while remaining efficient and domain-aware.

Results: Experimental evaluations reveal that MedAlmighty significantly improves disease diagnosis performance across datasets characterized by sparse and diverse medical data. The proposed model outperforms baselines by effectively integrating the generalizable representations of large models with the specialized knowledge from smaller models. The results confirm improved robustness and accuracy in complex diagnostic scenarios.

Discussion: The MedAlmighty framework demonstrates that incorporating general-domain representations via frozen large vision models-when guided by task-specific distillation strategies-can enhance the performance of lightweight medical models. This approach offers a promising solution to data scarcity and domain gap issues in medical imaging. Future work may explore extending this distillation strategy to other medical modalities and incorporating multimodal alignment for even richer representation learning.

Keywords: disease diagnosis; domain generalization; knowledge distillation; large vision model; model capacity.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Comparison of generalization and training efficiency between CNNs and DINOv2. This figure provides a comprehensive comparison of CNNs and DINOv2 in terms of generalization and training efficiency. **(a)** Generalization Performance: CNNs struggle with robustness and accuracy on unseen data, while DINOv2 exhibits stronger generalization across diverse tasks due to self-supervised learning. **(b)** Training Efficiency: DINOv2 requires significantly more computational resources and training time, limiting its practicality. **(c)** Synergy Potential: The figure also underscores the advantages of combining CNNs' efficiency with DINOv2's generalization, motivating the integration of both in a unified framework.

**Figure 2**
Comparing AUC values of DINOv2-ViTs14 with ResNet18, DINOv2-ViTb14 with ResNet50, and DINOv2-ViTl14 with ResNet50 on 12 MedMNIST datasets. Results are based on experiments using MedMNISTV2, where all models were evaluated on 224 × 224 images.

**Figure 3**
Comparing ACC values of DINOv2-ViTs14 with ResNet18, DINOv2-ViTb14 with ResNet50, and DINOv2-ViTl14 with ResNet50 on 12 MedMNIST datasets. Results are based on experiments using MedMNISTV2, where all models were evaluated on 224 × 224 images.

**Figure 4**
Performance evaluation on RetinaMNIST: AUC and ACC performance of ( t/α) with t = 2; AUC and ACC performance of (t/α) with α=0.2.

**Figure 5**
t-SNE visualization of features (ResNet50, DINOV2-ViTb14, MedAlmighty).

**Figure 6**
Input images **(top)** and heatmaps **(bottom)**. Color intensity reflects the relative importance of image regions for the model's classification.

See this image and copyright information in PMC

References

1. Arafa A. B., El-Fishawy N. A., Badawy M., Radad M. (2023). RN-autoencoder: reduced noise autoencoder for classifying imbalanced cancer genomic data. J. Biol. Eng. 17:7. 10.1186/s13036-022-00319-3 - DOI - PMC - PubMed
1. Arumugam M., Thiyagarajan A., Adhi L., Alagar S. (2024). Crossover smell agent optimized multilayer perceptron for precise brain tumor classification on mri images. Expert Syst. Appl. 238:121453. 10.1016/j.eswa.2023.121453 - DOI
1. Bao H., Dong L., Piao S., Wei F. (2021). Beit: bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.
1. Caron M., Touvron H., Misra I., Jégou H., Mairal J., Bojanowski P., et al. (2021). “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 9650–9660. 10.1109/ICCV48922.2021.00951 - DOI
1. Chen J., Fu C., Xie H., Zheng X., Geng R., Sham C.-W. (2022). Uncertainty teacher with dense focal loss for semi-supervised medical image segmentation. Comput. Biol. Med. 149:106034. 10.1016/j.compbiomed.2022.106034 - DOI - PubMed

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MedAlmighty: enhancing disease diagnosis with large vision model distillation

Affiliation

MedAlmighty: enhancing disease diagnosis with large vision model distillation

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous