Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 15;8(1):281.
doi: 10.1038/s41746-024-01409-w.

A review on generative AI models for synthetic medical text, time series, and longitudinal data

Affiliations

A review on generative AI models for synthetic medical text, time series, and longitudinal data

Mohammad Loni et al. NPJ Digit Med. .

Abstract

This paper presents the results of a novel scoping review on the practical models for generating three different types of synthetic health records (SHRs): medical text, time series, and longitudinal data. The innovative aspects of the review, which incorporate study objectives, data modality, and research methodology of the reviewed studies, uncover the importance and the scope of the topic for the digital medicine context. In total, 52 publications met the eligibility criteria for generating medical time series (22), longitudinal data (17), and medical text (13). Privacy preservation was found to be the main research objective of the studied papers, along with class imbalance, data scarcity, and data imputation as the other objectives. The adversarial network-based, probabilistic, and large language models exhibited superiority for generating synthetic longitudinal data, time series, and medical texts, respectively. Finding a reliable performance measure to quantify SHR re-identification risk is the major research gap of the topic.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the study selection process and research queries.
a PRISMA-ScR representation of the research methodology. b Research queries used in the study.
Fig. 2
Fig. 2. The mutual link between the data modalities, generative models, and research objectives found by the research objectives.
This Figure shows an overview of the findings in the paper. In summary, generative adversarial networks (GANs) were dominantly employed for generating medical time series. Large language models (LLMs) have been widely used for generating synthetic texts. The variational auto-encoder (VAE) method was employed in a minority of the studies, equally for generating longitudinal and text data, but it was not used for time series. Probabilistic models were dominantly utilized for longitudinal data.
Fig. 3
Fig. 3. Normalized distribution of performance measurement objectives over the data modalities. Larger circles display more publications in each category.
The Figure indicates an inadequate evaluation of re-identification of SHRs in studied papers. In addition, evaluating the utility of longitudinal data has been less researched compared to medical time series and text data.

Similar articles

Cited by

References

    1. Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. Ai in health and medicine. Nat. Med.28, 31–38 (2022). - PubMed
    1. McGenity, C. et al. Artificial intelligence in digital pathology: a systematic review and meta-analysis of diagnostic test accuracy. NPJ Digital Med.7, 114 (2024). - PMC - PubMed
    1. Regulation (eu) 2024/1689 of the european parliament and of the council of 13 june 2024 laying down harmonised rules on artificial intelligence https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689 (2024). Accessed on 13 June, 2024.
    1. Ghosheh, G. O., Li, J. & Zhu, T. A survey of generative adversarial networks for synthesizing structured electronic health records. ACM Comput. Surv.56, 1–34 (2024).
    1. Khoury, B., Kogan, C. & Daouk, S. International classification of diseases 11th edition (icd-11). In Encyclopedia of Personality and Individual Differences, 2350–2355 (Springer, 2020).

LinkOut - more resources