. 2023 Feb;36(1):164-177.

doi: 10.1007/s10278-022-00714-8. Epub 2022 Nov 2.

Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports

Pierre Chambon¹, Tessa S Cook², Curtis P Langlotz³

Affiliations

¹ Stanford University, Paris-Saclay University, École Centrale Paris, Stanford, USA. pchambon@stanford.edu.
² University of Pennsylvania, Philadelphia, USA.
³ Stanford University, Stanford, USA.

PMID: 36323915
PMCID: PMC9629758
DOI: 10.1007/s10278-022-00714-8

Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports

Pierre Chambon et al. J Digit Imaging. 2023 Feb.

. 2023 Feb;36(1):164-177.

doi: 10.1007/s10278-022-00714-8. Epub 2022 Nov 2.

Authors

Pierre Chambon¹, Tessa S Cook², Curtis P Langlotz³

Affiliations

¹ Stanford University, Paris-Saclay University, École Centrale Paris, Stanford, USA. pchambon@stanford.edu.
² University of Pennsylvania, Philadelphia, USA.
³ Stanford University, Stanford, USA.

PMID: 36323915
PMCID: PMC9629758
DOI: 10.1007/s10278-022-00714-8

Abstract

Building a document-level classifier for COVID-19 on radiology reports could help assist providers in their daily clinical routine, as well as create large numbers of labels for computer vision models. We have developed such a classifier by fine-tuning a BERT-like model initialized from RadBERT, its continuous pre-training on radiology reports that can be used on all radiology-related tasks. RadBERT outperforms all biomedical pre-trainings on this COVID-19 task (P<0.01) and helps our fine-tuned model achieve an 88.9 macro-averaged F1-score, when evaluated on both X-ray and CT reports. To build this model, we rely on a multi-institutional dataset re-sampled and enriched with concurrent lung diseases, helping the model to resist to distribution shifts. In addition, we explore a variety of fine-tuning and hyperparameter optimization techniques that accelerate fine-tuning convergence, stabilize performance, and improve accuracy, especially when data or computational resources are limited. Finally, we provide a set of visualization tools and explainability methods to better understand the performance of the model, and support its practical use in the clinical setting. Our approach offers a ready-to-use COVID-19 classifier and can be applied similarly to other radiology report classification tasks.

Keywords: BERT; COVID-19; Classification; Natural language processing (NLP); Radiology; Transformer.

PubMed Disclaimer

Conflict of interest statement

Personal financial interests: Board of directors and shareholder, Bunkerhill Health; Option holder, whiterabbit.ai; Advisor and option holder, GalileoCDS; Advisor and option holder, Sirona Medical; Advisor and option holder, Adra; Advisor and option holder, Kheiron; Advisor, Sixth Street; Chair, SIIM Board of Directors; Member at Large, Board of Directors of the Pennsylvania Radiological Society; Member at Large, Board of Directors of the Philadelphia Roentgen Ray Society; Member at Large, Board of Directors of the Association of University Radiologists (term just ended in June); Honoraria, Sectra (webinars); Honoraria, British Journal of Radiology (section editor); Speaker honorarium, Icahn School of Medicine (conference speaker); Speaker honorarium, MGH (conference speaker). Recent grant and gift support paid to academic institutions involved: Carestream; Clairity; GE Healthcare; Google Cloud; IBM; IDEXX; Hospital Israelita Albert Einstein; Kheiron; Lambda; Lunit; Microsoft; Nightingale Open Science; Nines; Philips; Subtle Medical; VinBrain; Whiterabbit.ai; Lowenstein Foundation; Gordon and Betty Moore Foundation; Paustenbach Fund. Grant funding: NIH; Independence Blue Cross; RSNA.

Figures

**Fig. 1**
Our classification task consists of consuming the text of a radiology report and generating one of three labels: *COVID-19*, *uncertain COVID-19*, and *no COVID-19*

**Fig. 2**
Our fine-tuning dataset includes radiology reports from 6 sites within the same health academic system, Penn Medicine. The left graph shows the number of reports provided by each site, dominated by three sites. The graph on the right shows that the data imbalance remains stable across these three main sites. There is less balance in the three remaining sites

**Fig. 3**
The Tree-structured Parzen Estimator builds empirical distributions on the hyperparameter space and suggests points that are highly likely under the distribution of good trials while being highly unlikely under the distribution of bad trials

**Fig. 4**
Five hundred transformers using our pre-training on radiology reports and fine-tuned for the COVID-19 classification task. The yellow points, using our fine-tuning approach, perform better than the blue points in the vast majority of cases, using a standard fine-tuning procedure. This visualization was obtained using the Weights & Biases platform [39]

**Fig. 5**
The red, yellow, and blue lines reflect the preponderance of COVID-19 as detected in radiology reports, all from the same health academic system, by our model. The green line represents the number of positive cases in the same county (data from the CDC COVID Tracker [40])

**Fig. 6**
For each report and model output, *integrated gradients* underline in green the words that contributed positively to the decision of the model and in red the ones that contributed negatively

See this image and copyright information in PMC

Cited by

Impact of hospital-specific domain adaptation on BERT-based models to classify neuroradiology reports.
Agarwal S, Wood D, Murray BAK, Wei Y, Busaidi AA, Kafiabadi S, Guilhem E, Lynch J, Townend M, Mazumder A, Barker GJ, Cole JH, Sasieni P, Ourselin S, Modat M, Booth TC. Agarwal S, et al. Eur Radiol. 2025 Sep;35(9):5299-5313. doi: 10.1007/s00330-025-11500-9. Epub 2025 Mar 17. Eur Radiol. 2025. PMID: 40097844 Free PMC article.
Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.
Yao MS, Chae A, Saraiya P, Kahn CE Jr, Witschey WR, Gee JC, Sagreiya H, Bastani O. Yao MS, et al. Commun Med (Lond). 2025 Aug 4;5(1):332. doi: 10.1038/s43856-025-01061-9. Commun Med (Lond). 2025. PMID: 40760099 Free PMC article.
A vision-language foundation model for the generation of realistic chest X-ray images.
Bluethgen C, Chambon P, Delbrouck JB, van der Sluijs R, Połacin M, Zambrano Chaves JM, Abraham TM, Purohit S, Langlotz CP, Chaudhari AS. Bluethgen C, et al. Nat Biomed Eng. 2025 Apr;9(4):494-506. doi: 10.1038/s41551-024-01246-y. Epub 2024 Aug 26. Nat Biomed Eng. 2025. PMID: 39187663 Free PMC article.
Automated deidentification of radiology reports combining transformer and "hide in plain sight" rule-based methods.
Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Chambon PJ, et al. J Am Med Inform Assoc. 2023 Jan 18;30(2):318-328. doi: 10.1093/jamia/ocac219. J Am Med Inform Assoc. 2023. PMID: 36416419 Free PMC article.

References

1. Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob. Llion Jones, Aidan N. Gomez: Lukasz Kaiser, and Illia Polosukhin. Attention is all you need; 2017.
1. Devlin Jacob, Chang Ming-Wei, Lee Kenton. and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding; 2019.
1. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-demos.6. https://aclanthology.org/2020.emnlp-demos.6.
1. Emily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jindi, Tristan Naumann, and Matthew McDermott. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics. 10.18653/v1/W19-1909. https://aclanthology.org/W19-1909.
1. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, Sep 2019. ISSN 1460-2059. 10.1093/bioinformatics/btz682. 10.1093/bioinformatics/btz682. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports

Affiliations

Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical