. 2022 Apr:174:234-247.

Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence

Juyong Kim¹, Jeremy C Weiss², Pradeep Ravikumar¹

Affiliations

¹ Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213.
² Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA 15213.

PMID: 38665367
PMCID: PMC11044887

Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence

Juyong Kim et al. Proc Mach Learn Res. 2022 Apr.

. 2022 Apr:174:234-247.

Authors

Juyong Kim¹, Jeremy C Weiss², Pradeep Ravikumar¹

Affiliations

¹ Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213.
² Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA 15213.

PMID: 38665367
PMCID: PMC11044887

Abstract

Spelling correction is a particularly important problem in clinical natural language processing because of the abundant occurrence of misspellings in medical records. However, the scarcity of labeled datasets in a clinical context makes it hard to build a machine learning system for such clinical spelling correction. In this work, we present a probabilistic model of correcting misspellings based on a simple conditional independence assumption, which leads to a modular decomposition into a language model and a corruption model. With a deep character-level language model trained on a large clinical corpus, and a simple edit-based corruption model, we can build a spelling correction model with small or no real data. Experimental results show that our model significantly outperforms baselines on two healthcare spelling correction datasets.

PubMed Disclaimer

Figures

**Figure 4:**
Beam search decoding results for several examples of the CSpell test set. For each example, we display the top 10 beam candidates. The column next to the candidate (Score) shows the final beam score for each candidate.

**Figure 1:**
Graphical model of our conditional independence model. The context and the typo are observed and the correct word is unobserved.

**Figure 2:**
Beam search of CIM on the Example 1 at time step $t = 3$ . The beam candidates are ranked by the sum of the language model score (LM) and the corruption model score (ED). The hyper-parameters of the corruption model are $C = 5.0$ and $n = 1$ . The beam width is chosen to $B = 1$ for clear visualization.

**Figure 3:**
Beam search decoding examples. For each example, we display the top 10 beam candidates. The column next to the candidate (Score) shows the final beam score for each candidate.

See this image and copyright information in PMC

References

1. Brill Eric and Moore Robert C. An improved error model for noisy channel spelling correction. In Proceedings of the 38th annual meeting of the association for computational linguistics, pages 286–293, 2000.
1. Damerau Fred J. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176, 1964.
1. Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv.org, October 2018.
1. DWYL. List of english words. https://github.com/dwyl/english-words, 2020. Commit on Oct 15, 2020.
1. Fivez Pieter, Šuster Simon, and Daelemans Walter. Unsupervised context-sensitive spelling correction of clinical free-text with word and character n-gram embeddings. In BioNLP 2017, pages 143–148, August 2017. doi: 10.18653/v1/W17-2317. URL https://www.aclweb.org/anthology/W17-2317. - DOI

Grants and funding

ZIA LM010024/ImNIH/Intramural NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence

Affiliations

Context-Sensitive Spelling Correction of Clinical Text via Conditional Independence

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources