Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 16:8:1644084.
doi: 10.3389/frai.2025.1644084. eCollection 2025.

Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents

Affiliations

Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents

Mark van Velzen et al. Front Artif Intell. .

Abstract

The widespread adoption of generative agents (GAs) is reshaping the healthcare landscape. Nonetheless, broad utilization is impeded by restricted access to high-quality, interoperable clinical documentation from electronic health records (EHRs) due to persistent legal, ethical, and technical barriers. Synthetic health data generation (SHDG), leveraging pre-trained large language models (LLMs) instantiated as GAs, could offer a practical solution by creating synthetic patient information that mimics genuine EHRs. The use of LLMs, however, is not without issues; significant concerns remain regarding privacy, potential bias propagation, the risk of generating inaccurate or misleading content, and the lack of transparency in how these models make decisions. We therefore propose a privacy-, linguistic-, and information-preserving SHDG protocol that employs multiple context-aware, role-specific GAs. Guided by targeted prompting and authentic EHRs-serving as structural and linguistic templates-role-specific GAs can, in principle, operate collaboratively through multi-turn interactions. We theorized that utilizing GAs in this fashion permits LLMs not only to produce synthetic EHRs that are accurate, consistent, and contextually appropriate, but also to expose the underlying decision-making process. To test this hypothesis, we developed a no-code GA-driven SHDG workflow as a proof of concept, which was implemented within a predefined, multi-layered data science infrastructure (DSI) stack-an integrated ensemble of software and hardware designed to support rapid prototyping and deployment. The DSI stack streamlines implementation for healthcare professionals, improving accessibility, usability, and cybersecurity. To deploy and validate GA-assisted workflows, we implemented a fully automated SHDG evaluation framework-co-developed with GenAI technology-which holistically compares the informational and linguistic features of synthetic, anonymized, and real EHRs at both the document and corpus levels. Our findings highlight that SHDG implemented through GAs offers a scalable, transparent, and reproducible methodology for unlocking the potential of clinical documentation to drive innovation, accelerate research, and advance the development of learning health systems. The source code, synthetic datasets, toolchains and prompts created for this study can be accessed at the GitHub repository: https://github.com/HR-DataLab-Healthcare/RESEARCH_SUPPORT/tree/main/PROJECTS/Generative_Agent_based_Data-Synthesis.

Keywords: clinical natural language processing (NLP); data synthesis; generative agents; healthcare; information theory; linguistics; privacy; synthetic health data generation (SHDG).

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Data science infrastructure (DSI) stack. (A) Schematic of the DSI stack, structured as modular, interoperable layers founded on key IT principles: abstraction and modularization, separation of concerns, interoperability and standardization, scalability, and resilience. Each layer—[1] … [8]—fulfils a distinct function, from data storage and processing to analytics and deployment. It supports flexible, maintainable, and scalable data science pipelines. The DSI stack aligns with the Double Diamond design model (https://www.designcouncil.org.uk/our-resources/the-double-diamond/). Lower layers focus on “Doing the right things”—data gathering and integration—, while upper layers emphasize “Doing things right”—curation, iteration, deployment. (B) Visualization of the roles and involvement of data scientists versus data engineers across the DSI stack. While data scientists are predominantly active in the human-oriented, upper layers of feature engineering, model development, and deployment, data engineers are primarily engaged in the machine-oriented, foundational layers involving warehousing, compute, and toolchains. This panel highlights the complementary skill sets necessary for an effective and robust data science infrastructure.
Figure 2
Figure 2
Visual representation of a no-code, multi-agent workflow for synthesizing EHRs. Data flows through connected tools and agents, enabling an iterative, structured generation process without manual coding. The here shown GA-assisted SHDG workflow begins with a Recursive Character Text Splitter that divides the uploaded PDF file containing anonymized EHR data into manageable chunks. These chunks are processed using Azure OpenAI Embeddings and stored in an In-Memory Vector Store. A Retriever Tool (RAG) then queries the stored embeddings to provide relevant context. The Azure ChatOpenAI component, configured with the GPT-4.0-mini model, interacts with stored agent memory (SQLite Agent Memory) and coordinates with two agents: the Supervisor—acting as a senior physiotherapist specialized in low back pain—who manages task instructions and workflow control, and directs the Tech Researcher—acting as a practicing physiotherapist (general or specialized—who executes prompts to generate synthetic Dutch-language EHR notes). Note: The workflow—accessed via a web interface—maintains contextual memory across user queries for seamless, multi-turn interactions and stops automatically when the supervisor determines completion. This design enables rapid prototyping of SHDG solutions by healthcare researchers and practitioners without requiring advanced expertise in AI. For more information on the adopted technologies and their implementation, see the Toolchain Section (Section 3.1.4 of the DSI stack).
Figure 3
Figure 3
Example of a single-turn input/output interaction when applying the multi-agent workflow for generating realistic, structured EHRs, as described in Figure 2. In this scenario, the End User—a practicing physiotherapist—uses a web-based interface to request the creation of 20 synthetic but realistic EHRs in Dutch. The request specifies detailed content and formatting requirements, including: a concise patient history summary, a clearly stated help-seeking question, an ICF-based diagnosis, measurable treatment goals, and a treatment plan aligned with KNGF guidelines. All records must use professional Dutch clinical language with correct abbreviations. The Supervisor ensures these specifications are complete and unambiguous before the Tech Researcher produces a synthetic yet realistic Dutch-language EHR. Each record contains the requested summary, patient demographics, presenting complaint, ICF-based functional and contextual factors, SMART goals, an individualized treatment plan, and SOAP-formatted progress notes. For illustration purposes, only Patient Dossier 14 from the generated set is shown here. The Supervisor reviews this output, confirms it meets all requirements, and marks the task as finished. Color coding: Blue—End User; Pink—Supervisor Agent; Green—Tech Researcher (worker Agent).
Figure 4
Figure 4
Document level assessment using a holistic benchmark framework for quantitative evaluation of synthetic data quality (see Table 4). The figure presents a matrix of violin plots comparing the distributions of eight features—Size (MB), Word Count, Unique Words, Document Length (Chars), Character Entropy, Word Entropy, Average Pointwise Mutual Information (PMI), and Jensen-Shannon (JS) Distance—across three datasets: (A) real-world PDF (N = 13), (B) pseudonymized markdown (N = 13), and (C) synthetic markdown (N = 20). Note, each feature is encoded by a unique color for visual clarity (see Legend upper right side of the figure). Violin plots combine aspects of box plots and kernel density plots to provide a nuanced visualization of distributional characteristics. Specifically, the width of each violin at a given value represents the estimated probability density of the data at that value, as calculated by a kernel density estimator. This allows for the depiction of multimodality, skewness, and overall distributional shape, beyond summary statistics such as mean or quartiles. In this matrix, each subplot includes both the violin plot and overlaid scatter points indicating individual data instances, thereby facilitating both distributional and sample-level comparison of features across the three datasets.

References

    1. Abdurahman S., Salkhordeh Ziabari A., Moore A. K., Bartels D. M., Dehghani M. (2025). A primer for evaluating large language models in social-science research. Adv. Methods Pract. Psychol. Sci. 8:25152459251325174. doi: 10.1177/25152459251325174 - DOI
    1. Abhishek M. K., Rao D. R. (2021). “Framework to secure docker containers” in Fifth world conference on smart trends in systems security and sustainability (WorldS4) (London, UK: IEEE; ), 152–156. doi: 10.1109/WorldS451998.2021.9514041 - DOI
    1. Ait A., Cánovas Izquierdo J. L., Cabot J. (2025). On the suitability of hugging face hub for empirical studies. Empir. Softw. Eng. 30, 1–48. doi: 10.1007/s10664-024-10608-8 - DOI
    1. Alemohammad S., Casco-Rodriguez J., Luzi L., Humayun A. I., Babaei H., LeJeune D., et al. (2024). Self-consuming generative models go mad, in: International conference on learning representations (ICLR), (Vienna, AT: ). doi: 10.48550/arXiv.2307.01850 - DOI
    1. Alsentzer E., Murphy J. R., Boag W., Weng W.-H., Jin D., Naumann T., et al. (2019). “Publicly available clinical BERT embeddings” in Proceedings of the 2nd clinical natural language processing workshop (Minneapolis, MN: Association for Computational Linguistics; ), 72–78. doi: 10.18653/v1/W19-1909 - DOI

LinkOut - more resources