Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence
- PMID: 38665367
- PMCID: PMC11044887
Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence
Abstract
Spelling correction is a particularly important problem in clinical natural language processing because of the abundant occurrence of misspellings in medical records. However, the scarcity of labeled datasets in a clinical context makes it hard to build a machine learning system for such clinical spelling correction. In this work, we present a probabilistic model of correcting misspellings based on a simple conditional independence assumption, which leads to a modular decomposition into a language model and a corruption model. With a deep character-level language model trained on a large clinical corpus, and a simple edit-based corruption model, we can build a spelling correction model with small or no real data. Experimental results show that our model significantly outperforms baselines on two healthcare spelling correction datasets.
Figures




References
-
- Brill Eric and Moore Robert C. An improved error model for noisy channel spelling correction. In Proceedings of the 38th annual meeting of the association for computational linguistics, pages 286–293, 2000.
-
- Damerau Fred J. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176, 1964.
-
- Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv.org, October 2018.
-
- DWYL. List of english words. https://github.com/dwyl/english-words, 2020. Commit on Oct 15, 2020.
-
- Fivez Pieter, Šuster Simon, and Daelemans Walter. Unsupervised context-sensitive spelling correction of clinical free-text with word and character n-gram embeddings. In BioNLP 2017, pages 143–148, August 2017. doi: 10.18653/v1/W17-2317. URL https://www.aclweb.org/anthology/W17-2317. - DOI
Grants and funding
LinkOut - more resources
Full Text Sources