On the Convergence of Epidemiology, Biostatistics, and Data Science
- PMID: 35005710
- PMCID: PMC8734556
- DOI: 10.1162/99608f92.9f0215e6
On the Convergence of Epidemiology, Biostatistics, and Data Science
Abstract
Epidemiology, biostatistics, and data science are broad disciplines that incorporate a variety of substantive areas. Common among them is a focus on quantitative approaches for solving intricate problems. When the substantive area is health and health care, the overlap is further cemented. Researchers in these disciplines are fluent in statistics, data management and analysis, and health and medicine, to name but a few competencies. Yet there are important and perhaps mutually exclusive attributes of these fields that warrant a tighter integration. For example, epidemiologists receive substantial training in the science of study design, measurement, and the art of causal inference. Biostatisticians are well versed in the theory and application of methodological techniques, as well as the design and conduct of public health research. Data scientists receive equivalently rigorous training in computational and visualization approaches for high-dimensional data. Compared to data scientists, epidemiologists and biostatisticians may have less expertise in computer science and informatics, while data scientists may benefit from a working knowledge of study design and causal inference. Collaboration and cross-training offer the opportunity to share and learn of the constructs, frameworks, theories, and methods of these fields with the goal of offering fresh and innovate perspectives for tackling challenging problems in health and health care. In this article, we first describe the evolution of these fields focusing on their convergence in the era of electronic health data, notably electronic medical records (EMRs). Next we present how a collaborative team may design, analyze, and implement an EMR-based study. Finally, we review the curricula at leading epidemiology, biostatistics, and data science training programs, identifying gaps and offering suggestions for the fields moving forward.
Keywords: biostatistics; causal inference; data science; electronic medical records; epidemiology; study design; training and education.
Figures
References
-
- Angrist J, & Pischke JS (2008). Mostly harmless econometrics. Princeton University Press.
-
- Borgman CL (2019). The lives and after lives of data. Harvard Data Science Review, 1(1). 10.1162/99608f92.9a36bdb6 - DOI
-
- Celentano DD, & Szklo M (2018). Gordis epidemiology. Elsevier.
-
- Cleveland WS (2014). Data science: An action plan for expanding the technical areas of the field of statistics. Statistical Analysis and Data Mining, 7(6), 414–417. 10.1111/j.1751-5823.2001.tb00477.x - DOI
Grants and funding
LinkOut - more resources
Full Text Sources