. 2025 Apr 6;4(2):110-143.

doi: 10.1002/hcs2.70009. eCollection 2025 Apr.

Rethinking Domain-Specific Pretraining by Supervised or Self-Supervised Learning for Chest Radiograph Classification: A Comparative Study Against ImageNet Counterparts in Cold-Start Active Learning

Han Yuan¹, Mingcheng Zhu^{1

2}, Rui Yang¹, Han Liu³, Irene Li⁴, Chuan Hong⁵

Affiliations

¹ Duke-NUS Medical School, Centre for Quantitative Medicine Singapore Singapore.
² Department of Engineering Science University of Oxford Oxford UK.
³ Department of Computer Science Vanderbilt University Nashville Tennessee USA.
⁴ Information Technology Center University of Tokyo Bunkyo-ku Japan.
⁵ Department of Biostatistics and Bioinformatics Duke University Durham North Carolina USA.

PMID: 40241982
PMCID: PMC11997468
DOI: 10.1002/hcs2.70009

Rethinking Domain-Specific Pretraining by Supervised or Self-Supervised Learning for Chest Radiograph Classification: A Comparative Study Against ImageNet Counterparts in Cold-Start Active Learning

Han Yuan et al. Health Care Sci. 2025.

. 2025 Apr 6;4(2):110-143.

doi: 10.1002/hcs2.70009. eCollection 2025 Apr.

Authors

Han Yuan¹, Mingcheng Zhu^{1

2}, Rui Yang¹, Han Liu³, Irene Li⁴, Chuan Hong⁵

Affiliations

¹ Duke-NUS Medical School, Centre for Quantitative Medicine Singapore Singapore.
² Department of Engineering Science University of Oxford Oxford UK.
³ Department of Computer Science Vanderbilt University Nashville Tennessee USA.
⁴ Information Technology Center University of Tokyo Bunkyo-ku Japan.
⁵ Department of Biostatistics and Bioinformatics Duke University Durham North Carolina USA.

PMID: 40241982
PMCID: PMC11997468
DOI: 10.1002/hcs2.70009

Abstract

Objective: Deep learning (DL) has become the prevailing method in chest radiograph analysis, yet its performance heavily depends on large quantities of annotated images. To mitigate the cost, cold-start active learning (AL), comprising an initialization followed by subsequent learning, selects a small subset of informative data points for labeling. Recent advancements in pretrained models by supervised or self-supervised learning tailored to chest radiograph have shown broad applicability to diverse downstream tasks. However, their potential in cold-start AL remains unexplored.

Methods: To validate the efficacy of domain-specific pretraining, we compared two foundation models: supervised TXRV and self-supervised REMEDIS with their general domain counterparts pretrained on ImageNet. Model performance was evaluated at both initialization and subsequent learning stages on two diagnostic tasks: psychiatric pneumonia and COVID-19. For initialization, we assessed their integration with three strategies: diversity, uncertainty, and hybrid sampling. For subsequent learning, we focused on uncertainty sampling powered by different pretrained models. We also conducted statistical tests to compare the foundation models with ImageNet counterparts, investigate the relationship between initialization and subsequent learning, examine the performance of one-shot initialization against the full AL process, and investigate the influence of class balance in initialization samples on initialization and subsequent learning.

Results: First, domain-specific foundation models failed to outperform ImageNet counterparts in six out of eight experiments on informative sample selection. Both domain-specific and general pretrained models were unable to generate representations that could substitute for the original images as model inputs in seven of the eight scenarios. However, pretrained model-based initialization surpassed random sampling, the default approach in cold-start AL. Second, initialization performance was positively correlated with subsequent learning performance, highlighting the importance of initialization strategies. Third, one-shot initialization performed comparably to the full AL process, demonstrating the potential of reducing experts' repeated waiting during AL iterations. Last, a U-shaped correlation was observed between the class balance of initialization samples and model performance, suggesting that the class balance is more strongly associated with performance at middle budget levels than at low or high budgets.

Conclusions: In this study, we highlighted the limitations of medical pretraining compared to general pretraining in the context of cold-start AL. We also identified promising outcomes related to cold-start AL, including initialization based on pretrained models, the positive influence of initialization on subsequent learning, the potential for one-shot initialization, and the influence of class balance on middle-budget AL. Researchers are encouraged to improve medical pretraining for versatile DL foundations and explore novel AL methods.

Keywords: COVID‐19; chest radiograph analysis; cold‐start active learning; psychiatric pneumonia; radiology foundation model.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Cold‐start AL workflow based on domain‐specific foundation models. (a) The initialization stage. (b) The subsequent learning stage. After sample selection and expert annotation, supervised model training can be conducted using either the original image pixels or embeddings derived from foundation models as classifier inputs.

**Figure 2**
One‐shot initialization performance on the Guangzhou data set. Subgraphs (a–p) present specific information detailed in the following explanations. The first, second, third, and last columns present the mean values of AUROC, the standard deviation of AUROC, the mean values of AUPRC, and the standard deviation of AUPRC, respectively. The first row displays the results for all samples with annotations, the upper bound, and random sampling. The second, third, and final rows show the outcomes for diversity sampling, uncertainty sampling, and hybrid sampling, respectively.

**Figure 3**
One‐shot initialization performance on the Pakistan data set. Subgraphs (a–p) present specific information detailed in the following explanations. The first, second, third, and last columns present the mean values of AUROC, the standard deviation of AUROC, the mean values of AUPRC, and the standard deviation of AUPRC, respectively. The first row displays the results for all samples with annotations, the upper bound, and random sampling. The second, third, and final rows show the outcomes for diversity sampling, uncertainty sampling, and hybrid sampling, respectively.

**Figure 4**
Subsequent learning performance on the Guangzhou data set. Subgraphs (a–p) present specific information detailed in the following explanations. The first, second, third, and last columns present the mean values of AUROC, the standard deviation of AUROC, the mean values of AUPRC, and the standard deviation of AUPRC, respectively. The first row displays the results for all samples with annotations, the upper bound, and random sampling. The second, third, and final rows show the outcomes for diversity sampling, uncertainty sampling, and hybrid sampling, respectively.

**Figure 5**
Subsequent learning performance on the Pakistan data set. Subgraphs (a–p) present specific information detailed in the following explanations. The first, second, third, and last columns present the mean values of AUROC, the standard deviation of AUROC, the mean values of AUPRC, and the standard deviation of AUPRC, respectively. The first row displays the results for all samples with annotations, the upper bound, and random sampling. The second, third, and final rows show the outcomes for diversity sampling, uncertainty sampling, and hybrid sampling, respectively.

See this image and copyright information in PMC

References

1. Litjens G., Kooi T., Bejnordi B. E., et al., “A Survey on Deep Learning in Medical Image Analysis,” Medical Image Analysis 42 (2017): 60–88. - PubMed
1. Çallı E., Sogancioglu E., van Ginneken B., van Leeuwen K. G., and Murphy K., “Deep Learning for Chest X‐Ray Analysis: A Survey,” Medical Image Analysis 72 (2021): 102125. - PubMed
1. Yuan H., Yu K., Xie F., Liu M., and Sun S., “Automated Machine Learning With Interpretation: A Systematic Review of Methodologies and Applications in Healthcare,” Medicine Advances 2, no. 3 (2024): 205–237.
1. Li W., Li J., Wang Z., et al., “Pathal: An Active Learning Framework for Histopathology Image Analysis,” IEEE Transactions on Medical Imaging 41, no. 5 (2022): 1176–1187. - PMC - PubMed
1. Sayin B., Krivosheev E., Yang J., Passerini A., and Casati F., “A Review and Experimental Analysis of Active Learning Over Crowdsourced Data,” Artificial Intelligence Review 54 (2021): 5283–5305.

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Rethinking Domain-Specific Pretraining by Supervised or Self-Supervised Learning for Chest Radiograph Classification: A Comparative Study Against ImageNet Counterparts in Cold-Start Active Learning

Affiliations

Rethinking Domain-Specific Pretraining by Supervised or Self-Supervised Learning for Chest Radiograph Classification: A Comparative Study Against ImageNet Counterparts in Cold-Start Active Learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources