Exploring the Utilization of Synthetic Data in Unsupervised Clustering for Opioid Misuse Analysis
- PMID: 40417526
- PMCID: PMC12099348
Exploring the Utilization of Synthetic Data in Unsupervised Clustering for Opioid Misuse Analysis
Abstract
Privacy and security restrictions on medical data pose challenges to collaborative research, making synthetic data an increasingly attractive solution. Recent advancements in Generative AI technologies, like GAN models, have improved synthetic data generation. This study investigates the use of synthetic data in clustering models for opioid misuse analysis, generating a dataset that replicates real-world data from 2017 to 2019, including demographics and diagnosis codes. By maintaining patient privacy, we enable comprehensive analysis without compromising security. We developed unsupervised clustering models to identify opioid misuse patterns and assessed the effectiveness of synthetic data across four scenarios: training on real dataset and testing on real dataset (TRTR), training on real dataset and testing on synthetic dataset (TRTS), TSTR, and TSTS. Results demonstrate that synthetic data can replicate real data distributions and clustering characteristics as a training set, offering significant potential for collaborative model development and optimization without exposing privacy or security risks.
©2024 AMIA - All rights reserved.
Figures



Similar articles
-
The urgent need to accelerate synthetic data privacy frameworks for medical research.Lancet Digit Health. 2025 Feb;7(2):e157-e160. doi: 10.1016/S2589-7500(24)00196-1. Epub 2024 Nov 26. Lancet Digit Health. 2025. PMID: 39603900 Review.
-
Generative artificial intelligence to produce high-fidelity blastocyst-stage embryo images.Hum Reprod. 2024 Jun 3;39(6):1197-1207. doi: 10.1093/humrep/deae064. Hum Reprod. 2024. PMID: 38600621 Free PMC article.
-
Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients.BMC Med Inform Decis Mak. 2020 Apr 29;20(1):79. doi: 10.1186/s12911-020-1099-y. BMC Med Inform Decis Mak. 2020. PMID: 32349766 Free PMC article.
-
Augmenting a spine CT scans dataset using VAEs, GANs, and transfer learning for improved detection of vertebral compression fractures.Comput Biol Med. 2025 Jan;184:109446. doi: 10.1016/j.compbiomed.2024.109446. Epub 2024 Nov 16. Comput Biol Med. 2025. PMID: 39550911
-
Towards regulatory generative AI in ophthalmology healthcare: a security and privacy perspective.Br J Ophthalmol. 2024 Sep 20;108(10):1349-1353. doi: 10.1136/bjo-2024-325167. Br J Ophthalmol. 2024. PMID: 38834290 Review.
References
-
- Xu H, Dinev T, Smith J, Hart P. Information privacy concerns: Linking individual perceptions with institutional privacy assurances. Journal of the Association for Information Systems. 2011;12(12):1.
-
- Gostin LO, Levit LA, Nass SJ, editors. Beyond the HIPAA privacy rule: enhancing privacy, improving health through research - PubMed
-
- Kuo NI, Perez-Concha O, Hanly M, Mnatzaganian E, Hao B, Di Sipio M, Yu G, Vanjara J, Valerie IC, de Oliveira Costa J, Churches T. Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project. JMIR Medical Education. 2024 Jan 16;10:e51388. - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical