RadBERT: Adapting Transformer-based Language Models to Radiology
- PMID: 35923376
- PMCID: PMC9344353
- DOI: 10.1148/ryai.210258
RadBERT: Adapting Transformer-based Language Models to Radiology
Abstract
Purpose: To investigate if tailoring a transformer-based language model to radiology is beneficial for radiology natural language processing (NLP) applications.
Materials and methods: This retrospective study presents a family of bidirectional encoder representations from transformers (BERT)-based language models adapted for radiology, named RadBERT. Transformers were pretrained with either 2.16 or 4.42 million radiology reports from U.S. Department of Veterans Affairs health care systems nationwide on top of four different initializations (BERT-base, Clinical-BERT, robustly optimized BERT pretraining approach [RoBERTa], and BioMed-RoBERTa) to create six variants of RadBERT. Each variant was fine-tuned for three representative NLP tasks in radiology: (a) abnormal sentence classification: models classified sentences in radiology reports as reporting abnormal or normal findings; (b) report coding: models assigned a diagnostic code to a given radiology report for five coding systems; and (c) report summarization: given the findings section of a radiology report, models selected key sentences that summarized the findings. Model performance was compared by bootstrap resampling with five intensively studied transformer language models as baselines: BERT-base, BioBERT, Clinical-BERT, BlueBERT, and BioMed-RoBERTa.
Results: For abnormal sentence classification, all models performed well (accuracies above 97.5 and F1 scores above 95.0). RadBERT variants achieved significantly higher scores than corresponding baselines when given only 10% or less of 12 458 annotated training sentences. For report coding, all variants outperformed baselines significantly for all five coding systems. The variant RadBERT-BioMed-RoBERTa performed the best among all models for report summarization, achieving a Recall-Oriented Understudy for Gisting Evaluation-1 score of 16.18 compared with 15.27 by the corresponding baseline (BioMed-RoBERTa, P < .004).
Conclusion: Transformer-based language models tailored to radiology had improved performance of radiology NLP tasks compared with baseline transformer language models.Keywords: Translation, Unsupervised Learning, Transfer Learning, Neural Networks, Informatics Supplemental material is available for this article. © RSNA, 2022See also commentary by Wiggins and Tejani in this issue.
Keywords: Informatics; Neural Networks; Transfer Learning; Translation; Unsupervised Learning.
© 2022 by the Radiological Society of North America, Inc.
Conflict of interest statement
Disclosures of conflicts of interest: A.Y. No relevant relationships. J.M. No relevant relationships. X.L. No relevant relationships. J.D. No relevant relationships. E.Y.C. No relevant relationships. A.G. Department of Defense grant paid to UCSD (covers a small percentage of the author's salary). C.N.H. No relevant relationships.
Figures

![Overview of our study design, which includes pretraining and
fine-tuning of RadBERT. (A) In pretraining, different weight initializations
were considered to create variants of RadBERT. (B) The variants were
fine-tuned for three important radiology natural language processing (NLP)
tasks: abnormal sentence classification, report coding, and report
summarization. The performance of RadBERT variants for these tasks was
compared with a set of intensively studied transformer-based language models
as baselines. (C) Examples of each task and how performance was measured. In
the abnormality identification task, a sentence in a radiology report was
considered “abnormal” if it reported an abnormal finding and
“normal” otherwise. A human-annotated abnormality was
considered ground truth to evaluate the performance of an NLP model. In the
code classification task, models were expected to output diagnostic codes
(eg, abdominal aortic aneurysm, Breast Imaging Reporting and Data System
[BI-RADS], and Lung Imaging Reporting and Data System [Lung-RADS]) that
match the codes given by human providers as the ground truth for a given
radiology report. During report summarization, the models generated a short
summary given the findings in a radiology report. Summary quality was
measured by how similar it was to the impression section of the input
report. AAA = abdominal aortic aneurysm, BERT = bidirectional encoder
representations from transformers, RadBERT = BERT-based language model
adapted for radiology, RoBERTa = robustly optimized BERT pretraining
approach.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9343/9344353/d1737b98f1ad/ryai.210258.fig1.gif)

![Confusion matrices for report coding with two language models
(BERT-base and RadBERT-RoBERTa) fine-tuned to assign diagnostic codes in two
coding systems (Lung Imaging Reporting and Data System [Lung-RADS] and
abnormal) (see Appendix E4 [supplement]). (A, B) The Lung-RADS dataset
consisted of six categories: “incomplete,” “benign
nodule appearance or behavior,” “probably benign
nodule,” “suspicious nodule-a,” “suspicious
nodule-b,” and “prior lung cancer,” denoted as numbers
1 to 6 in the figure. (C, D) The abnormal dataset also consisted of six
categories: “major abnormality,” “no attn
needed,” “major abnormality, physician aware,”
“minor abnormality,” “possible malignancy,”
“significant abnormality, attn needed,” and
“normal.” The figures show that RadBERT-RoBERTa improved from
BERT-base by better distinguishing code numbers 5 and 6 for Lung-RADS and
making fewer errors for code number 1 of the abnormal dataset. BERT =
bidirectional encoder representations from transformers, RadBERT =
BERT-based language model adapted for radiology, RoBERTa = robustly
optimized BERT pretraining approach.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9343/9344353/62b0861b67c0/ryai.210258.fig3.gif)

Similar articles
-
Deep Learning Approach for Negation and Speculation Detection for Automated Important Finding Flagging and Extraction in Radiology Report: Internal Validation and Technique Comparison Study.JMIR Med Inform. 2023 Apr 25;11:e46348. doi: 10.2196/46348. JMIR Med Inform. 2023. PMID: 37097731 Free PMC article.
-
Domain-adapted Large Language Models for Classifying Nuclear Medicine Reports.Radiol Artif Intell. 2023 Sep 27;5(6):e220281. doi: 10.1148/ryai.220281. eCollection 2023 Nov. Radiol Artif Intell. 2023. PMID: 38074793 Free PMC article.
-
Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models.J Healthc Inform Res. 2024 May 17;8(3):463-477. doi: 10.1007/s41666-024-00166-5. eCollection 2024 Sep. J Healthc Inform Res. 2024. PMID: 39131104 Free PMC article.
-
AMMU: A survey of transformer-based biomedical pretrained language models.J Biomed Inform. 2022 Feb;126:103982. doi: 10.1016/j.jbi.2021.103982. Epub 2021 Dec 31. J Biomed Inform. 2022. PMID: 34974190 Review.
-
Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods.Health Care Sci. 2023 Apr 24;2(2):120-128. doi: 10.1002/hcs2.40. eCollection 2023 Apr. Health Care Sci. 2023. PMID: 38938764 Free PMC article. Review.
Cited by
-
Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports.J Digit Imaging. 2023 Feb;36(1):164-177. doi: 10.1007/s10278-022-00714-8. Epub 2022 Nov 2. J Digit Imaging. 2023. PMID: 36323915 Free PMC article.
-
Automatic text classification of prostate cancer malignancy scores in radiology reports using NLP models.Med Biol Eng Comput. 2024 Nov;62(11):3373-3383. doi: 10.1007/s11517-024-03131-x. Epub 2024 Jun 7. Med Biol Eng Comput. 2024. PMID: 38844661 Free PMC article.
-
Leveraging pretrained language models for seizure frequency extraction from epilepsy evaluation reports.NPJ Digit Med. 2025 Apr 14;8(1):208. doi: 10.1038/s41746-025-01592-4. NPJ Digit Med. 2025. PMID: 40229513 Free PMC article.
-
Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review.J Med Internet Res. 2025 May 15;27:e68998. doi: 10.2196/68998. J Med Internet Res. 2025. PMID: 40371947 Free PMC article.
-
A latent diffusion approach to visual attribution in medical imaging.Sci Rep. 2025 Jan 6;15(1):962. doi: 10.1038/s41598-024-81646-x. Sci Rep. 2025. PMID: 39762275 Free PMC article.
References
-
- Vaswani A , Shazeer N , Parmar N , et al. . Attention is all you need . In: Advances in Neural Information Processing Systems 30 (NIPS 2017) , 2017. ; 5998 – 6008 . https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-... .
-
- Devlin J , Chang MW , Lee K , Toutanova K . BERT: Pretraining of deep bidirectional transformers for language understanding . arXiv:1810.04805 [preprint] https://arxiv.org/abs/1810.04805. Posted October 11, 2018. Accessed June 7, 2022 .
-
- Liu Y , Ott M , Goyal N , et al. . RoBERTa: A robustly optimized BERT pretraining approach . arXiv:1907.11692 [preprint] https://arxiv.org/abs/1907.11692. Posted July 26, 2019. Accessed June 7, 2022 .
-
- Mikolov T , Chen K , Corrado G , Dean J . Efficient estimation of word representations in vector space . arXiv:1301.3781 [preprint] https://arxiv.org/abs/1301.3781. Posted January 16, 2013. Accessed June 7, 2022 .