Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers
- PMID: 31437930
- PMCID: PMC6779034
- DOI: 10.3233/SHTI190228
Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers
Abstract
Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experiment to detect altered mental status in emergency department provider notes, we tested several classifiers on clinical notes in their original form and on their automatically de-identified counterpart. We tested both traditional bag-of-words based machine learning models as well as word-embedding based deep learning models. We evaluated the models on 1,113 history of present illness notes. A total of 1,795 protected health information tokens were replaced in the de-identification process across all notes. The deep learning models had the best performance with accuracies of 95% on both original and de-identified notes. However, there was no significant difference in the performance of any of the models on the original vs. the de-identified notes.
Keywords: Data Anonymization; Machine Learning; Natural Language Processing.
Figures
References
-
- Meystre SM., Savova GK., Kipper-Schuler KC., and Hurdle JF., Extracting information from textual documents in the electronic health record: a review of recent research, Yearbook of medical informatics 17 (2008), 128–144. - PubMed
-
- HIPAA Privacy Rule, 45 CFR Part 160, Part 164(A,E)., U.S. Department of Health and Humans Services, 2002.
-
- Federal Policy for the Protection of Human Subjects (‘Common Rule, HHS.Gov. (2009). https://www.hhs.gov/ohrp/regulations-and-policy/regulations/common-rule/... (accessed November 20, 2018).
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
