Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Dec;58 Suppl(Suppl):S11-S19.
doi: 10.1016/j.jbi.2015.06.007. Epub 2015 Jul 28.

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1

Affiliations
Review

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1

Amber Stubbs et al. J Biomed Inform. 2015 Dec.

Abstract

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured four tracks. The first of these was the de-identification track focused on identifying protected health information (PHI) in longitudinal clinical narratives. The longitudinal nature of clinical narratives calls particular attention to details of information that, while benign on their own in separate records, can lead to identification of patients in combination in longitudinal records. Accordingly, the 2014 de-identification track addressed a broader set of entities and PHI than covered by the Health Insurance Portability and Accountability Act - the focus of the de-identification shared task that was organized in 2006. Ten teams tackled the 2014 de-identification task and submitted 22 system outputs for evaluation. Each team was evaluated on their best performing system output. Three of the 10 systems achieved F1 scores over .90, and seven of the top 10 scored over .75. The most successful systems combined conditional random fields and hand-written rules. Our findings indicate that automated systems can be very effective for this task, but that de-identification is not yet a solved problem.

Keywords: Machine learning; Medical records; Natural language processing; Shared task.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Micro-averaged entity-based micro-averaged F1 measures by category: i2b2-PHI categories

References

    1. Aberdeen John, Bayer Samuel, Yeniterzi Reyyan, Wellner Ben, Clark Cheryl, Hanauer David, Malin Bradley, Hirschman Lynette. The MITRE Identification Scrubber Toolkit: Design, training, and assessment. International Journal of Medical Informatics. 2010 Dec;79(12):849–59. doi: 10.1016/j.ijmedinf.2010.09.007. - PubMed
    1. [May 2015];Apache UIMA. 2006 http://uima.apache.org.
    1. Baum LE, Petrie T. Statistical Inference for Probabilistic Functions of Finite State Markov Chains. [28 November 2011];The Annals of Mathematical Statistics. 1966 37(6):1554–1563. doi:10.1214/aoms/1177699147.
    1. [January 7, 2015];BioASQ project. Data. http://www.bioasq.org/participate/data.
    1. Blei David M, Ng Andrew Y, Jordan Michael I. Latent dirichlet allocation. The Journal of Machine Learning research. 2003;3:993–1022.