Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 6;2(1):e0000082.
doi: 10.1371/journal.pdig.0000082. eCollection 2023 Jan.

Synthetic data in health care: A narrative review

Affiliations

Synthetic data in health care: A narrative review

Aldren Gonzales et al. PLOS Digit Health. .

Abstract

Data are central to research, public health, and in developing health information technology (IT) systems. Nevertheless, access to most data in health care is tightly controlled, which may limit innovation, development, and efficient implementation of new research, products, services, or systems. Using synthetic data is one of the many innovative ways that can allow organizations to share datasets with broader users. However, only a limited set of literature is available that explores its potentials and applications in health care. In this review paper, we examined existing literature to bridge the gap and highlight the utility of synthetic data in health care. We searched PubMed, Scopus, and Google Scholar to identify peer-reviewed articles, conference papers, reports, and thesis/dissertations articles related to the generation and use of synthetic datasets in health care. The review identified seven use cases of synthetic data in health care: a) simulation and prediction research, b) hypothesis, methods, and algorithm testing, c) epidemiology/public health research, d) health IT development, e) education and training, f) public release of datasets, and g) linking data. The review also identified readily and publicly accessible health care datasets, databases, and sandboxes containing synthetic data with varying degrees of utility for research, education, and software development. The review provided evidence that synthetic data are helpful in different aspects of health care and research. While the original real data remains the preferred choice, synthetic data hold possibilities in bridging data access gaps in research and evidence-based policymaking.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

References

    1. Doshi JA, Hendrick FB, Graff JS, Stuart BC. Data, Data Everywhere, but Access Remains a Big Issue for Researchers: A Review of Access Policies for Publicly-Funded Patient-Level Health Care Data in the United States. EGEMS. 2016;4(2):1204–. doi: 10.13063/2327-9214.1204 - DOI - PMC - PubMed
    1. Yozwiak NL, Schaffner SF, Sabeti PC. Data sharing: Make outbreak research open access. Nature. 2015;518(7540):477–9. doi: 10.1038/518477a - DOI - PubMed
    1. Ho HKK, Görges M, Portales-Casamar E. Data Access and Usage Practices Across a Cohort of Researchers at a Large Tertiary Pediatric Hospital: Qualitative Survey Study. JMIR Med Inform. 2018;6(2):e32. doi: 10.2196/medinform.8724 - DOI - PMC - PubMed
    1. Summary of the HIPAA privacy rule 2003 [cited 22 September 2019]. In: HHS.gov [Internet]. Available from: https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/ind....
    1. Levenstein M, Tyler A, Bleckman J. The Researcher Passport: Improving Data Access and Confidentiality Protection. 2018. May 1 [cited 22 September 2019]. Available from: https://www.icpsr.umich.edu/files/about/researcher/ICPSR_ResearcherCrede....

LinkOut - more resources