A comparative study of pretrained language models for long clinical text

Yikuan Li¹, Ramsey M Wehbe^{2

3}, Faraz S Ahmad^{1

2

3}, Hanyin Wang¹, Yuan Luo¹

Affiliations

¹ Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.
² Division of Cardiology, Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.
³ Bluhm Cardiovascular Institute Center for Artificial Intelligence, Northwestern Medicine, Chicago, Illinois, USA.

PMID: 36451266
PMCID: PMC9846675
DOI: 10.1093/jamia/ocac225

A comparative study of pretrained language models for long clinical text

Yikuan Li et al. J Am Med Inform Assoc. 2023.

. 2023 Jan 18;30(2):340-347.

doi: 10.1093/jamia/ocac225.

Authors

Yikuan Li¹, Ramsey M Wehbe^{2

3}, Faraz S Ahmad^{1

2

3}, Hanyin Wang¹, Yuan Luo¹

Affiliations

¹ Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.
² Division of Cardiology, Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.
³ Bluhm Cardiovascular Institute Center for Artificial Intelligence, Northwestern Medicine, Chicago, Illinois, USA.

PMID: 36451266
PMCID: PMC9846675
DOI: 10.1093/jamia/ocac225

Abstract

Objective: Clinical knowledge-enriched transformer models (eg, ClinicalBERT) have state-of-the-art results on clinical natural language processing (NLP) tasks. One of the core limitations of these transformer models is the substantial memory consumption due to their full self-attention mechanism, which leads to the performance degradation in long clinical texts. To overcome this, we propose to leverage long-sequence transformer models (eg, Longformer and BigBird), which extend the maximum input sequence length from 512 to 4096, to enhance the ability to model long-term dependencies in long clinical texts.

Materials and methods: Inspired by the success of long-sequence transformer models and the fact that clinical notes are mostly long, we introduce 2 domain-enriched language models, Clinical-Longformer and Clinical-BigBird, which are pretrained on a large-scale clinical corpus. We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks.

Results: The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT and other short-sequence transformers in all 10 downstream tasks and achieve new state-of-the-art results.

Discussion: Our pretrained language models provide the bedrock for clinical NLP using long texts. We have made our source code available at https://github.com/luoyuanlab/Clinical-Longformer, and the pretrained models available for public download at: https://huggingface.co/yikuan8/Clinical-Longformer.

Conclusion: This study demonstrates that clinical knowledge-enriched long-sequence transformers are able to learn long-term dependencies in long clinical text. Our methods can also inspire the development of other domain-enriched long-sequence transformers.

Keywords: clinical natural language processing; named entity recognition; natural language inference; question answering; text classification.

PubMed Disclaimer

Figures

**Figure 1.**
The pipeline for pretraining and fine-tuning transformer-based language models.

See this image and copyright information in PMC

References

1. Brown T, et al. Language models are few-shot learners. Adv Neural Inform Process Syst 2020; 33: 1877–901.
1. Devlin J, et al. Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).Minneapolis, MN: Association for Computational Linguistics; 2019: 4171–86.
1. Liu Y, et al. Roberta: a robustly optimized Bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
1. Yao L, Jin Z, Mao C, et al. Traditional Chinese medicine clinical records classification with BERT and domain specific corpora. J Am Med Inform Assoc 2019; 26 (12): 1632–6. - PMC - PubMed
1. Zhang Z, Liu J, Razavian N. BERT-XML: large scale automated ICD coding using BERT pretraining. In: Proceedings of the 3rd Clinical Natural Language Processing Workshop; 2020.

Publication types

Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparative study of pretrained language models for long clinical text

Affiliations

A comparative study of pretrained language models for long clinical text

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources