Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 20:47:108921.
doi: 10.1016/j.dib.2023.108921. eCollection 2023 Apr.

A synthetic dataset of liver disorder patients

Affiliations

A synthetic dataset of liver disorder patients

Giovanna Nicora et al. Data Brief. .

Abstract

The data in this article include 10,000 synthetic patients with liver disorders, characterized by 70 different variables, including clinical features, and patient outcomes, such as hospital admission or surgery. Patient data are generated, simulating as close as possible real patient data, using a publicly available Bayesian network describing a casual model for liver disorders. By varying the network parameters, we also generated an additional set of 500 patients with characteristics that deviated from the initial patient population. We provide an overview of the synthetic data generation process and the associated scripts for generating the cohorts. This dataset can be useful for the machine learning models training and validation, especially under the effect of dataset shift between training and testing sets.

Keywords: Bayesian network; Causal model; Dataset shift; Machine learning; Synthetic patients.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: GN is a full employee of enGenome srl.

Figures

Fig 1
Fig. 1
Number of synthetic patients in the 2 different datasets, stratified for sex, age and hospitalization.

References

    1. A. Oniśko, M.J. Druzdzel, H. Wasyluk, A probabilistic causal model for diagnosis of liver disorders, in: Proceedings of the Seventh International Symposium on Intelligent Information Systems (IIS–98), Malbork, Poland, June 15–19, 1998, pp. 379–387
    1. Scutari M. Learning Bayesian networks with the bnlearn R package. J. Stat. Softw. 2010;35:1–22. doi: 10.18637/jss.v035.i03. - DOI - PubMed
    1. Briganti G., Le Moine O. Artificial intelligence in medicine: today and tomorrow. Front. Med. (Lausanne) 2020;7(27) doi: 10.3389/fmed.2020.00027. https://www.frontiersin.org/articles/10.3389/fmed.2020.00027 Accessed: Oct. 28, 2022. [Online]. Available: PMID: 32118012; PMCID: PMC7012990. - DOI - DOI - PMC - PubMed
    1. Peek N., Combi C., Marin R., Bellazzi R. Thirty years of artificial intelligence in medicine (AIME) conferences: a review of research themes. Artif. Intell. Med. 2015;65(1):61–73. doi: 10.1016/j.artmed.2015.07.003. - DOI - PubMed
    1. Nicora G., Rios M., Abu-Hanna A., Bellazzi R. Evaluating pointwise reliability of machine learning prediction. J. Biomed. Inform. 2022 doi: 10.1016/j.jbi.2022.103996. - DOI - PubMed

LinkOut - more resources