Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2017 May 1;24(3):596-606.
doi: 10.1093/jamia/ocw156.

De-identification of patient notes with recurrent neural networks

Affiliations
Comparative Study

De-identification of patient notes with recurrent neural networks

Franck Dernoncourt et al. J Am Med Inform Assoc. .

Abstract

Objective: Patient notes in electronic health records (EHRs) may contain critical information for medical investigations. However, the vast majority of medical investigators can only access de-identified notes, in order to protect the confidentiality of patients. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) defines 18 types of protected health information that needs to be removed to de-identify patient notes. Manual de-identification is impractical given the size of electronic health record databases, the limited number of researchers with access to non-de-identified notes, and the frequent mistakes of human annotators. A reliable automated de-identification system would consequently be of high value.

Materials and methods: We introduce the first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems. We compare the performance of the system with state-of-the-art systems on two datasets: the i2b2 2014 de-identification challenge dataset, which is the largest publicly available de-identification dataset, and the MIMIC de-identification dataset, which we assembled and is twice as large as the i2b2 2014 dataset.

Results: Our ANN model outperforms the state-of-the-art systems. It yields an F1-score of 97.85 on the i2b2 2014 dataset, with a recall of 97.38 and a precision of 98.32, and an F1-score of 99.23 on the MIMIC de-identification dataset, with a recall of 99.25 and a precision of 99.21.

Conclusion: Our findings support the use of ANNs for de-identification of patient notes, as they show better performance than previously published systems while requiring no manual feature engineering.

Keywords: de-identification; medical language processing; neural networks.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Architecture of the artificial neural network (ANN) model. (RNN, recurrent neural network.) The type of RNN used in this model is long short-term memory (LSTM). n is the number of tokens, and xi is the ith token. VT is the mapping from tokens to token embeddings. (i) is the number of characters and xi,j is the jth character in the ith token. VC is the mapping from characters to character embeddings. ei is the character-enhanced token embeddings of the ith token. di is the output of the LSTM of the label prediction layer, ai is the probability vector over labels, yi is the predicted label of the ith token.
Figure 2.
Figure 2.
Binary token-based F1-scores for each PHI category. The evaluation is based on PHI types that are defined by HIPAA as well as additional types specific to each dataset. Each PHI category and the corresponding types are defined in Table 1. The “All” category refers to the F1-score micro-averaged over all PHI categories. The PROFESSION category exists only in the i2b2 dataset and was plotted separately to avoid distorting the y-axis. For the same reason, the AGE category in MIMIC was drawn separately.
Figure 3.
Figure 3.
Impact of the training set size on the binary HIPAA token-based F1-scores on the MIMIC dataset. The 100% training set size refers to using all of the dataset minus the test set, which amounts to 2 046 488 tokens and 42 531 PHI instances. As expected, both CRF and ANN models benefit from having more training samples.
Figure 4.
Figure 4.
Impact of the number of labeled PHI instances in the training set on the model’s performance for each PHI type in the i2b2 dataset. Figure (A) presents all PHI types, and Figure (B) focuses on the most commonly occurring PHI types. Having more PHI instances in the training set helps increase F1-score, but some PHI types are harder to detect than others.
Figure 5.
Figure 5.
Ablation test performance based on binary HIPAA token-based evaluation. ANN is the model based on artificial neural networks. “ seq opt” is the ANN model without the label sequence optimization layer. “ pre-train” is the ANN model where token embeddings are initialized with random values instead of pre-trained embeddings. “ token emb” is the ANN model using only character-based token embeddings, without token embeddings. “ character emb” is the ANN model using only token embeddings, without character-based token embeddings.

References

    1. DesRoches CM, Worzala C, Bates S. Some hospitals are falling behind in meeting “meaningful use” criteria and could be vulnerable to penalties in 2015. Health Affairs. 2013;32:1355–60. - PubMed
    1. Wright A, Henkin S, Feblowitz et al. . Early results of the meaningful use program for electronic health records. New Engl J Med. 2013;368:779–80. - PubMed
    1. Office for Civil Rights H. Standards for privacy of individually identifiable health information. Final rule. Federal Register. 2002;67:53181. - PubMed
    1. Neamatullah I, Douglass MM, Li-wei HL et al. . Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8:1. - PMC - PubMed
    1. Douglass M, Clifford G, Reisner A et al. . De-identification algorithm for free-text nursing notes. Comput Cardiol. 2005:331–34.

Publication types