Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Apr 26;17(1):50.
doi: 10.1186/s12911-017-0437-1.

Clinical records anonymisation and text extraction (CRATE): an open-source software system

Affiliations
Review

Clinical records anonymisation and text extraction (CRATE): an open-source software system

Rudolf N Cardinal. BMC Med Inform Decis Mak. .

Abstract

Background: Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent.

Results: This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research.

Conclusions: Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.

Keywords: Anonymisation; Clinical informatics; De-identification; Electronic medical records; Open-source software; Pseudonymisation; Psychiatry.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Overview of the roles that CRATE can play in the creation of a research database. The figure shows a schematic of a full EMR containing sensitive and identifiable information, its processing into a pseudonymised research database, and methods through which researchers may use the research database to contact patients about research, while preserving anonymity for those who have not consented to be contacted. Key functions of CRATE are shown, as follows. a Anonymisation of source data in a relational database framework, using identifiers in the source data to “scrub” free text. In this example the date of birth has also been partially obscured. b Generation of crypographically secure research IDs using hashed message authentication codes and one-time pads. An integer transient research ID is illustrated; full research IDs use longer hexadecimal digests. c Provision of a managed relational database interface to natural language processing tools such as GATE. d Provision of an optional web front end to a research database. e Management of a consent-to-contact process. The anonymisation, NLP, front end, and consent-to-contact components are modular and usable separately

References

    1. Caldicott F. Information: To share or not to share? The Information Governance Review [Internet]. UK Department of Health. 2013.
    1. UK Department of Health . Confidentiality: NHS Code of Practice [Internet] 2003.
    1. UK General Medical Council . Good practice in research and Consent to research [Internet] 2013.
    1. NHS England . The NHS Constitution for England [Internet] 2013.
    1. NHS England . The NHS Constitution for England [Internet] 2015.

Publication types