Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 14;3(4):557-566.
doi: 10.1093/jamiaopen/ooaa060. eCollection 2020 Dec.

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives

Affiliations

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives

Randi E Foraker et al. JAMIA Open. .

Abstract

Background: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification.

Objectives: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns.

Methods: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3).

Results: For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions.

Discussion and conclusion: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.

Keywords: data analysis; electronic health records and systems; precision health care; protected health information; synthetic data.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Data synthesis process.
Figure 2.
Figure 2.
Alarms/PICU length of stay by PRISM III score: real (A) and synthetic (B) data.
Figure 3.
Figure 3.
Chlamydia rates (per 100 000 persons) by zip code: real (left) versus synthetic (right) data, 2014. *Darker color indicates a higher rate.

Similar articles

Cited by

References

    1. Foraker R, Mann DL, Payne PRO.. Are synthetic data derivatives the future of translational medicine? JACC Basic Transl Sci 2018; 3(5): 716–8. - PMC - PubMed
    1. Nair S, Hsu D, Celi LA.. Challenges and opportunities in secondary analyses of electronic health record data. In: Data MIT Critical Data, ed. Secondary Analysis of Electronic Health Records. Cham: Springer International Publishing, 2016: 17–26. - PubMed
    1. Federal Policy for the Protection of Human Subjects ('Common Rule'). In: Code of Federal Regulations, ed. U.S. Department of Health and Human Services.
    1. The HIPAA Privacy Rule. In: Code of Federal Regulations, ed. U.S. Department of Health and Human Services.
    1. Miller AR, Tucker C.. Privacy protection and technology diffusion: the case of electronic medical records. Manag Sci 2009; 55 (7): 1077–93.

LinkOut - more resources