Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 23;7(1):1727.
doi: 10.23889/ijpds.v7i1.1727. eCollection 2022.

An overview of synthetic administrative data for research

Affiliations

An overview of synthetic administrative data for research

Theodora Kokosi et al. Int J Popul Data Sci. .

Abstract

Use of administrative data for research and for planning services has increased over recent decades due to the value of the large, rich information available. However, concerns about the release of sensitive or personal data and the associated disclosure risk can lead to lengthy approval processes and restricted data access. This can delay or prevent the production of timely evidence. A promising solution to facilitate more efficient data access is to create synthetic versions of the original datasets which are less likely to hold confidential information and can minimise disclosure risk. Such data may be used as an interim solution, allowing researchers to develop their analysis plans on non-disclosive data, whilst waiting for access to the real data. We aim to provide an overview of the background and uses of synthetic data and describe common methods used to generate synthetic data in the context of UK administrative research. We propose a simplified terminology for categories of synthetic data (univariate, multivariate, and complex modality synthetic data) as well as a more comprehensive description of the terminology used in the existing literature and illustrate challenges and future directions for research.

Keywords: administrative datasets; data confidentiality; data linkage; data utility; statistical disclosure control; synthetic data.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interests: The authors declare that there is no conflict of interest.

References

    1. Penner AM, Dodge KA. Using administrative data for social science and policy. RSF Russell Sage Found J Soc Sci. 2019;5(3):1–18. 10.7758/RSF.2019.5.3.01 - DOI - PMC - PubMed
    1. Mc Grath-Lone L, Libuy N, Harron K, Jay MA, Wijlaars L, Etoori D, et al. Data Resource Profile: The Education and Child Health Insights from Linked Data (ECHILD) Database. Int J Epidemiol. 2022;51(1):17–17f. 10.1093/ije/dyab149 - DOI - PMC - PubMed
    1. Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto ML, et al. Challenges in administrative data linkage for research. Big Data Soc. 2017;4(2):2053951717745678. 10.1177/2053951717745678 - DOI - PMC - PubMed
    1. Fiore M, Katsikouli P, Zavou E, Cunche M, Fessant F, Le Hello D, et al. Privacy of trajectory micro-data: a survey. 2019; 10.48550/arXiv.1903.12211 - DOI
    1. Ritchie F. The ‘Five Safes’: A framework for planning, designing and evaluating data access solutions [Internet]. Available from: https://uwe-repository.worktribe.com/output/880713/the-five-safes-a-fram...

Publication types

LinkOut - more resources