Synthetic ALSPAC longitudinal datasets for the Big Data VR project
- PMID: 28989981
- PMCID: PMC5605951
- DOI: 10.12688/wellcomeopenres.12441.1
Synthetic ALSPAC longitudinal datasets for the Big Data VR project
Abstract
Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information. In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.
Keywords: ALSPAC; Simulated data; data visualisation; synthetic data; virtual reality; visual analytics.
Conflict of interest statement
Competing interests: No competing interests were disclosed.
References
-
- Wilson RC, Butters OW, Avraam D, et al. : DataSHIELD: New Directions and Dimensions. Data Sci J. 2017;16:21 ISSN 1683-1470. 10.5334/dsj-2017-021 - DOI
-
- Shlomo N: Statistical Disclosure Limitation for Health Data: A Statistical Agency Perspective. Springer International Publishing, Cham,2015;201–230. ISBN 978-3-319-23633-9. 10.1007/978-3-319-23633-9_9 - DOI
-
- R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria,2017. Reference Source
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources