Eight practices for data management to enable team data science
- PMID: 33948240
- PMCID: PMC8057476
- DOI: 10.1017/cts.2020.501
Eight practices for data management to enable team data science
Abstract
Introduction: In clinical and translational research, data science is often and fortuitously integrated with data collection. This contrasts to the typical position of data scientists in other settings, where they are isolated from data collectors. Because of this, effective use of data science techniques to resolve translational questions requires innovation in the organization and management of these data.
Methods: We propose an operational framework that respects this important difference in how research teams are organized. To maximize the accuracy and speed of the clinical and translational data science enterprise under this framework, we define a set of eight best practices for data management.
Results: In our own work at the University of Rochester, we have strived to utilize these practices in a customized version of the open source LabKey platform for integrated data management and collaboration. We have applied this platform to cohorts that longitudinally track multidomain data from over 3000 subjects.
Conclusions: We argue that this has made analytical datasets more readily available and lowered the bar to interdisciplinary collaboration, enabling a team-based data science that is unique to the clinical and translational setting.
Keywords: Data analysis; bioinformatics; data management; data science; databases; pediatric; research informatics; systems biology.
© The Association for Clinical and Translational Science 2020.
Conflict of interest statement
The authors declare no conflicts of interest are present.
Figures


Similar articles
-
LabKey Server: an open source platform for scientific data integration, analysis and collaboration.BMC Bioinformatics. 2011 Mar 9;12:71. doi: 10.1186/1471-2105-12-71. BMC Bioinformatics. 2011. PMID: 21385461 Free PMC article.
-
The Iowa Health Data Resource (IHDR): an innovative framework for transforming the clinical health data ecosystem.J Am Med Inform Assoc. 2024 Feb 16;31(3):720-726. doi: 10.1093/jamia/ocad236. J Am Med Inform Assoc. 2024. PMID: 38102790 Free PMC article.
-
Enhancing translational team effectiveness: The Wisconsin Interventions in Team Science framework for translating empirically informed strategies into evidence-based interventions.J Clin Transl Sci. 2021 Jul 21;5(1):e158. doi: 10.1017/cts.2021.825. eCollection 2021. J Clin Transl Sci. 2021. PMID: 34527297 Free PMC article.
-
An integrative review and practical guide to team development interventions for translational science teams: One size does not fit all.J Clin Transl Sci. 2021 Aug 9;5(1):e198. doi: 10.1017/cts.2021.832. eCollection 2021. J Clin Transl Sci. 2021. PMID: 34888067 Free PMC article. Review.
-
The science of team science: A review of the empirical evidence and research gaps on collaboration in science.Am Psychol. 2018 May-Jun;73(4):532-548. doi: 10.1037/amp0000319. Am Psychol. 2018. PMID: 29792466 Review.
Cited by
-
Data science in clinical and translational research: Improving the health of the data to knowledge pipeline.J Clin Transl Sci. 2021 Mar 9;5(1):e77. doi: 10.1017/cts.2020.569. J Clin Transl Sci. 2021. PMID: 33948295 Free PMC article. No abstract available.
-
A guide to developing harmonized research workflows in a team science context.Exp Neurol. 2025 Oct;392:115333. doi: 10.1016/j.expneurol.2025.115333. Epub 2025 Jun 5. Exp Neurol. 2025. PMID: 40482901 Free PMC article. Review.
References
-
- Donoho D. 50 Years of Data Science. Princeton NJ: Tukey Centennial Workshop. 2015.
-
- Horbar JD, et al. Weight growth velocity and postnatal growth failure in infants 501 to 1500 grams: 2000–2013. Pediatrics 2015; 136(1): e84–e92. - PubMed
-
- Breiman L. Statistical modeling: The two cultures. Statistical Science 2001; 16(3): 199–231.
LinkOut - more resources
Full Text Sources
Medical