Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers

Jihad S Obeid¹, Paul M Heider¹, Erin R Weeda², Andrew J Matuskowitz³, Christine M Carr^{3

1}, Kevin Gagnon⁴, Tami Crawford¹, Stephane M Meystre¹

Affiliations

¹ Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA.
² Department of Clinical Pharmacy and Outcome Sciences, Medical University of South Carolina, Charleston, SC, USA.
³ Department of Emergency Medicine, Medical University of South Carolina, Charleston, SC, USA.
⁴ Department of Computer Science, University of South Carolina, Columbia, SC, USA.

PMID: 31437930
PMCID: PMC6779034
DOI: 10.3233/SHTI190228

Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers

Jihad S Obeid et al. Stud Health Technol Inform. 2019.

. 2019 Aug 21:264:283-287.

doi: 10.3233/SHTI190228.

Authors

Jihad S Obeid¹, Paul M Heider¹, Erin R Weeda², Andrew J Matuskowitz³, Christine M Carr^{3

1}, Kevin Gagnon⁴, Tami Crawford¹, Stephane M Meystre¹

Affiliations

¹ Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA.
² Department of Clinical Pharmacy and Outcome Sciences, Medical University of South Carolina, Charleston, SC, USA.
³ Department of Emergency Medicine, Medical University of South Carolina, Charleston, SC, USA.
⁴ Department of Computer Science, University of South Carolina, Columbia, SC, USA.

PMID: 31437930
PMCID: PMC6779034
DOI: 10.3233/SHTI190228

Abstract

Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experiment to detect altered mental status in emergency department provider notes, we tested several classifiers on clinical notes in their original form and on their automatically de-identified counterpart. We tested both traditional bag-of-words based machine learning models as well as word-embedding based deep learning models. We evaluated the models on 1,113 history of present illness notes. A total of 1,795 protected health information tokens were replaced in the de-identification process across all notes. The deep learning models had the best performance with accuracies of 95% on both original and de-identified notes. However, there was no significant difference in the performance of any of the models on the original vs. the de-identified notes.

Keywords: Data Anonymization; Machine Learning; Natural Language Processing.

PubMed Disclaimer

Figures

**Figure 1.**
AUC values and 95% confidence intervals for all the models for both original and de-identified (Deid) data.

See this image and copyright information in PMC

References

1. Obeid JS., Beskow LM., Rape M., Gouripeddi R., Black RA., Cimino JJ., Embi PJ., Weng C., Marnocha R., and Buse JB., A survey of practices for the use of electronic health records to support research recruitment, Journal of Clinical and Translational Science 1 (2017), 246–252. - PMC - PubMed
1. Meystre SM., Savova GK., Kipper-Schuler KC., and Hurdle JF., Extracting information from textual documents in the electronic health record: a review of recent research, Yearbook of medical informatics 17 (2008), 128–144. - PubMed
1. Shivade C., Raghavan P., Fosler-Lussier E., Embi PJ., Elhadad N., Johnson SB., and Lai AM., A review of approaches to identifying patient phenotype cohorts using electronic health records, Journal of the American Medical Informatics Association 21 (2014), 221–230. - PMC - PubMed
1. HIPAA Privacy Rule, 45 CFR Part 160, Part 164(A,E)., U.S. Department of Health and Humans Services, 2002.
1. Federal Policy for the Protection of Human Subjects (‘Common Rule, HHS.Gov. (2009). https://www.hhs.gov/ohrp/regulations-and-policy/regulations/common-rule/... (accessed November 20, 2018).

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers

Affiliations

Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources