Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 23;5(1):e14.
doi: 10.1017/cts.2020.501.

Eight practices for data management to enable team data science

Affiliations

Eight practices for data management to enable team data science

Andrew McDavid et al. J Clin Transl Sci. .

Abstract

Introduction: In clinical and translational research, data science is often and fortuitously integrated with data collection. This contrasts to the typical position of data scientists in other settings, where they are isolated from data collectors. Because of this, effective use of data science techniques to resolve translational questions requires innovation in the organization and management of these data.

Methods: We propose an operational framework that respects this important difference in how research teams are organized. To maximize the accuracy and speed of the clinical and translational data science enterprise under this framework, we define a set of eight best practices for data management.

Results: In our own work at the University of Rochester, we have strived to utilize these practices in a customized version of the open source LabKey platform for integrated data management and collaboration. We have applied this platform to cohorts that longitudinally track multidomain data from over 3000 subjects.

Conclusions: We argue that this has made analytical datasets more readily available and lowered the bar to interdisciplinary collaboration, enabling a team-based data science that is unique to the clinical and translational setting.

Keywords: Data analysis; bioinformatics; data management; data science; databases; pediatric; research informatics; systems biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest are present.

Figures

Fig. 1.
Fig. 1.
A data science workflow in clinical and translational teams. The lifecycle of a Team Data Science project begins with data collection and proceeds in a nonlinear and iterative fashion until conclusions are communicated and data and models are available for reuse (1a). Study personnel will interact in varying degrees with different aspects of the data science lifecycle (1b), while a data scientist visits all phases. Bolded interactions highlight a primary use of a role, while dashed lines indicate ancillary uses.
Fig. 2.
Fig. 2.
A high-level overview of how study personnel interact with the BLIS data management platform. Clinicians, technicians, and experimentalists generate data for different aspects of the study. Data engineers implement the centralized study portal using the BLIS data management platform, with responsibility to connect all elements of the workflow and interact continuously with all study team members.

Similar articles

Cited by

References

    1. Donoho D. 50 Years of Data Science. Princeton NJ: Tukey Centennial Workshop. 2015.
    1. Horbar JD, et al. Weight growth velocity and postnatal growth failure in infants 501 to 1500 grams: 2000–2013. Pediatrics 2015; 136(1): e84–e92. - PubMed
    1. Liu X, et al. Birth weight, gestational age, fetal growth and childhood asthma hospitalization. Allergy, Asthma & Clinical Immunology 2014; 10(1): 13. doi: 10.1186/1710-1492-10-13. - DOI - PMC - PubMed
    1. Grier A, et al. Impact of prematurity and nutrition on the developing gut microbiome and preterm infant growth. Microbiome 2017; 5(1): 158. doi: 10.1186/s40168-017-0377-0. - DOI - PMC - PubMed
    1. Breiman L. Statistical modeling: The two cultures. Statistical Science 2001; 16(3): 199–231.

LinkOut - more resources