Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 20;14(1):11437.
doi: 10.1038/s41598-024-61721-z.

Medical calculators derived synthetic cohorts: a novel method for generating synthetic patient data

Affiliations

Medical calculators derived synthetic cohorts: a novel method for generating synthetic patient data

Francis Jeanson et al. Sci Rep. .

Abstract

This study shows that we can use synthetic cohorts created from medical risk calculators to gain insights into how risk estimations, clinical reasoning, data-driven subgrouping, and the confidence in risk calculator scores are connected. When prediction variables aren't evenly distributed in these synthetic cohorts, they can be used to group similar cases together, revealing new insights about how cohorts behave. We also found that the confidence in predictions made by these calculators can vary depending on patient characteristics. This suggests that it might be beneficial to include a "normalized confidence" score in future versions of these calculators for healthcare professionals. We plan to explore this idea further in our upcoming research.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Distribution of ASCVD and SMART prediction values. (a) The full plot of SMART by ASCVD with all 26,880,000 profiles. Each dot represents a single synthetic patient, with its corresponding ASCVD score and SMART score. As there is overlap of dots in some areas, boxplots (b) and heat maps (c, d) were used to describe the distribution of the patients in the plot. (b) SMART by ASCVD binned into 20 equal (by width) bins. (c) SMART by ASCVD binned for density measurements. (d) Zoom-in on SMART < 25% and ASCVD < 25% from figure c.
Figure 2
Figure 2
Sample diagrams to demonstrate the uniform distribution of characteristics in the computed cohort.
Figure 3
Figure 3
K-prototype clustering for a target of 5 clusters. K-prototype clustering for a target of 5 clusters on all profile generating input categorical and numeric variables, as well as the post-hoc UMRI score. UMRI was cropped at 15 to illustrate details of the membership categories represented by either of the five colors.
Figure 4
Figure 4
Standard deviations of ASCVD and SMART as the risk scores increase. Standard deviations of grouped risk scores for ASCVD and SMART as risk scores increase by increments of 0.001.
Figure 5
Figure 5
Mean Unmet Risk Index with standard deviations for select variables. Mean Unmet risk index (UMRI) for all “African” and “Other” race profiles; “Female” and “Male” gender profiles; for age profiles; for systolic blood pressures; for high-density lipoprotein (HDL), and for high-sensitivity C-reactive protein (hsCRP). Red square show means, vertical line show ± 1 standard deviations.

References

    1. Gonzales A, Guruswamy G, Smith SR. Synthetic data in health care: A narrative review. PLOS Digit. Health. 2023;2:e0000082. doi: 10.1371/journal.pdig.0000082. - DOI - PMC - PubMed
    1. Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: Innovation, application, and privacy. NPJ Digit. Med. 2023;6:186. doi: 10.1038/s41746-023-00927-3. - DOI - PMC - PubMed
    1. Goncalves A, et al. Generation and evaluation of synthetic patient data. BMC Med. Res. Methodol. 2020;20:108. doi: 10.1186/s12874-020-00977-1. - DOI - PMC - PubMed
    1. Endres M., Mannarapotta Venugopal A. & Tran T. S. Synthetic data generation: A comparative study. in Proceeding of 26th International Database Engineering Application Symposium ACM, 94–102. 10.1145/3548785.3548793 (2022).
    1. Green TA, Whitt S, Belden JL, Erdelez S, Shyu CR. Medical calculators: Prevalence, and barriers to use. Comp. Meth. Prog. Biomed. 2019;179:105002. doi: 10.1016/j.cmpb.2019.105002. - DOI - PubMed