Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record Deidentification
- PMID: 34457124
- PMCID: PMC8378656
Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record Deidentification
Abstract
Electronic Health Records (EHRs) have become the primary form of medical data-keeping across the United States. Federal law restricts the sharing of any EHR data that contains protected health information (PHI). De-identification, the process of identifying and removing all PHI, is crucial for making EHR data publicly available for scientific research. This project explores several deep learning-based named entity recognition (NER) methods to determine which method(s) perform better on the de-identification task. We trained and tested our models on the i2b2 training dataset, and qualitatively assessed their performance using EHR data collected from a local hospital. We found that 1) Bi-LSTM-CRF represents the best-performing encoder/decoder combination, 2) character-embeddings tend to improve precision at the price of recall, and 3) transformers alone under-perform as context encoders. Future work focused on structuring medical text may improve the extraction of semantic and syntactic information for the purposes of EHR deidentification.
©2021 AMIA - All rights reserved.
Figures
References
-
- Myrick K, Ogburn D, Ward B. Percentage of office-based physicians using any electronic health record (EHR)/electronic medical record (EMR) system and physicians that have a certified EHR/EMR system, by U.S. state. National Center for Health Statistics; 2019.
-
- Adler-Milstein J, Jha AK. HITECH act drove large gains in hospital electronic health record adoption. Health Affairs. 2017;36(8):1416–1422. - PubMed
-
- Henry J, Pylypchuk Y, Searcy T, Patel V. Adoption of Electronic Health Record Systems among U.S. Non-Federal Acute Care Hospitals: 2008-2015. ONC Data Brief. 2016:35.
-
- Dorr D, Phillips W, Phansalkar S, Sims S, Hurdle J. Assessing the difficulty and time cost of De-identification in clinical narratives. Methods Inf Med. 2018;45:246–52. - PubMed
-
- Li J, Sun A, Han J, Li C. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering. 2020. pp. 1–1.
MeSH terms
LinkOut - more resources
Full Text Sources