Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 10;7(1):435.
doi: 10.1038/s41597-020-00773-y.

Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19

Affiliations

Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19

Carolin E M Jakob et al. Sci Data. .

Abstract

The Lean European Open Survey on SARS-CoV-2 Infected Patients (LEOSS) is a European registry for studying the epidemiology and clinical course of COVID-19. To support evidence-generation at the rapid pace required in a pandemic, LEOSS follows an Open Science approach, making data available to the public in real-time. To protect patient privacy, quantitative anonymization procedures are used to protect the continuously published data stream consisting of 16 variables on the course and therapy of COVID-19 from singling out, inference and linkage attacks. We investigated the bias introduced by this process and found that it has very little impact on the quality of output data. Current laws do not specify requirements for the application of formal anonymization methods, there is a lack of guidelines with clear recommendations and few real-world applications of quantitative anonymization procedures have been described in the literature. We therefore believe that our work can help others with developing urgently needed anonymization pipelines for their projects.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Development of various properties of the LEOSS PUF over time, which is represented by the size of the primary dataset in LEOSS. (a) Fraction of cases published, (b) case fatality rate before and after anonymization.
Fig. 2
Fig. 2
Comparison of demographic parameters before and after anonymization for the primary dataset with 2,200 records. (a) Age distribution in years, (b) gender distribution.
Fig. 3
Fig. 3
Comparison of clinical parameters before and after anonymization for the primary dataset with 2,200 records. (a) Patients per phase, (b) outcome, (c) superinfections in uncomplicated phase, (d) superinfection in complicated phase, (e) superinfections in critical phase.
Fig. 4
Fig. 4
Change of logit coefficients of univariate association of patients with age >45 years and death before and after anonymization over time, which is represented by the size of the primary dataset.
Fig. 5
Fig. 5
Development of re-identification risks before and after anonymization.
Fig. 6
Fig. 6
Overview of the workflow used to develop the anonymization pipeline.
Fig. 7
Fig. 7
Semantic domain structuring for a sensitive variable.

References

    1. Li X, et al. Transmission dynamics and evolutionary history of 2019‐nCoV. J. Med. Virol. 2020;92:501–511. doi: 10.1002/jmv.25701. - DOI - PMC - PubMed
    1. World Health Organization. WHO to accelerate research and innovation for new coronavirus. https://www.who.int/news-room/detail/06-02-2020-who-to-accelerate-resear... (2020).
    1. Flaxman S, et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature. 2020;584:257–261. doi: 10.1038/s41586-020-2405-7. - DOI - PubMed
    1. Chinazzi M, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368:395–400. doi: 10.1126/science.aba9757. - DOI - PMC - PubMed
    1. Nicola M, et al. The socio-economic implications of the coronavirus pandemic (COVID-19): A review. Int. J. Surg. 2020;78:185–193. doi: 10.1016/j.ijsu.2020.04.018. - DOI - PMC - PubMed

Publication types