Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Spring;2(2):10.1162/99608f92.9f0215e6.
doi: 10.1162/99608f92.9f0215e6. Epub 2020 Apr 30.

On the Convergence of Epidemiology, Biostatistics, and Data Science

Affiliations

On the Convergence of Epidemiology, Biostatistics, and Data Science

Neal D Goldstein et al. Harv Data Sci Rev. 2020 Spring.

Abstract

Epidemiology, biostatistics, and data science are broad disciplines that incorporate a variety of substantive areas. Common among them is a focus on quantitative approaches for solving intricate problems. When the substantive area is health and health care, the overlap is further cemented. Researchers in these disciplines are fluent in statistics, data management and analysis, and health and medicine, to name but a few competencies. Yet there are important and perhaps mutually exclusive attributes of these fields that warrant a tighter integration. For example, epidemiologists receive substantial training in the science of study design, measurement, and the art of causal inference. Biostatisticians are well versed in the theory and application of methodological techniques, as well as the design and conduct of public health research. Data scientists receive equivalently rigorous training in computational and visualization approaches for high-dimensional data. Compared to data scientists, epidemiologists and biostatisticians may have less expertise in computer science and informatics, while data scientists may benefit from a working knowledge of study design and causal inference. Collaboration and cross-training offer the opportunity to share and learn of the constructs, frameworks, theories, and methods of these fields with the goal of offering fresh and innovate perspectives for tackling challenging problems in health and health care. In this article, we first describe the evolution of these fields focusing on their convergence in the era of electronic health data, notably electronic medical records (EMRs). Next we present how a collaborative team may design, analyze, and implement an EMR-based study. Finally, we review the curricula at leading epidemiology, biostatistics, and data science training programs, identifying gaps and offering suggestions for the fields moving forward.

Keywords: biostatistics; causal inference; data science; electronic medical records; epidemiology; study design; training and education.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. The data science Venn diagram.
Reprinted under the Creative Commons license (Conway, 2013).
Figure 2.
Figure 2.
Simplified architecture of an electronic medical record system as it relates to our research question: Does the number of occupied beds in an intensive care unit increase risk for infection?
Figure 3.
Figure 3.. The traditional systems development lifecycle.
Adapted from Information Management and Security Staff, 2003, Chapter 1.
Figure 4.
Figure 4.
Results of the curriculum review for inclusion of data science in epidemiology (A) and biostatistics (B) training programs; and inclusion of epidemiology (C) and statistics (D) in data science training programs.

Similar articles

Cited by

References

    1. Angrist J, & Pischke JS (2008). Mostly harmless econometrics. Princeton University Press.
    1. Blakely T, Lynch J, Simons K, Bentley R, & Rose S (2019). Reflection on modern methods: When worlds collide-prediction, machine learning and causal inference. International Journal of Epidemiology, dyz132. 10.1093/ije/dyz132 - DOI - PMC - PubMed
    1. Borgman CL (2019). The lives and after lives of data. Harvard Data Science Review, 1(1). 10.1162/99608f92.9a36bdb6 - DOI
    1. Celentano DD, & Szklo M (2018). Gordis epidemiology. Elsevier.
    1. Cleveland WS (2014). Data science: An action plan for expanding the technical areas of the field of statistics. Statistical Analysis and Data Mining, 7(6), 414–417. 10.1111/j.1751-5823.2001.tb00477.x - DOI

LinkOut - more resources