Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Oct 9;6(1):186.
doi: 10.1038/s41746-023-00927-3.

Harnessing the power of synthetic data in healthcare: innovation, application, and privacy

Affiliations
Review

Harnessing the power of synthetic data in healthcare: innovation, application, and privacy

Mauro Giuffrè et al. NPJ Digit Med. .

Abstract

Data-driven decision-making in modern healthcare underpins innovation and predictive analytics in public health and clinical research. Synthetic data has shown promise in finance and economics to improve risk assessment, portfolio optimization, and algorithmic trading. However, higher stakes, potential liabilities, and healthcare practitioner distrust make clinical use of synthetic data difficult. This paper explores the potential benefits and limitations of synthetic data in the healthcare analytics context. We begin with real-world healthcare applications of synthetic data that informs government policy, enhance data privacy, and augment datasets for predictive analytics. We then preview future applications of synthetic data in the emergent field of digital twin technology. We explore the issues of data quality and data bias in synthetic data, which can limit applicability across different applications in the clinical context, and privacy concerns stemming from data misuse and risk of re-identification. Finally, we evaluate the role of regulatory agencies in promoting transparency and accountability and propose strategies for risk mitigation such as Differential Privacy (DP) and a dataset chain of custody to maintain data integrity, traceability, and accountability. Synthetic data can improve healthcare, but measures to protect patient well-being and maintain ethical standards are key to promote responsible use.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Highlights on Synthetic Data and their application in healthcare research, reviewing bias, quality, and privacy concerns.
Despite the first attempt to generate synthetic data dates bate to the 1940s, current state of the art methods use Generative Adversarial Networks (GANs) or Variational Auto-encoders (VAEs). In terms of classification, synthetic data encompasses a spectrum that ranges from partially to fully synthetic. While partially synthetic incorporates real-world data, fully synthetic data are generated de novo. As reported in the text, synthetic data can have several applications in healthcare research, including imaging, infective disease prevention and outbreaks prediction, and digital twins. However, the lack of robust methods to audit the perpetration of bias, accuracy, and representativeness of real-world medical scenarios, has severely limited interpretability, use and trust from the healthcare sector. One of the greatest concerns related to synthetic data involves patients’ privacy. Current regulations from General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) are not sufficient or up-to-date to cover possible leakage of patients’ information from synthetic dataset. In this context, differential privacy may result valuable, but its usage has been limited by the privacy-utility trade-off. The definition of a clear chain of custody, can ensure integrity, security, and data privacy throughout data lifecycle providing transparency, traceability, and accountability at each stage.

Similar articles

Cited by

References

    1. Assefa, S. Generating Synthetic Data in Finance: Opportunities, Challenges and Pitfalls. Available at SSRN: https://ssrn.com/abstract=3634235. (2020).
    1. Gonzales A, Guruswamy G, Smith SR. Synthetic data in health care: A narrative review. PLOS Digital Health. 2023;2:e0000082. doi: 10.1371/journal.pdig.0000082. - DOI - PMC - PubMed
    1. McDuff, D., Curran T. & Kadambi, A. Synthetic Data in Healthcare. arXiv preprint arXiv:2304.03243 (2023).
    1. Gotz D, Borland D. Data-driven healthcare: challenges and opportunities for interactive visualization. IEEE computer Graph. Appl. 2016;36:90–96. doi: 10.1109/MCG.2016.59. - DOI - PubMed
    1. Jordon J. et al. Weller Adrian. Synthetic Data – what, why and how? arXiv: 2205.03257 [cs], (2022).