On the Convergence of Epidemiology, Biostatistics, and Data Science
- PMID: 35005710
- PMCID: PMC8734556
- DOI: 10.1162/99608f92.9f0215e6
On the Convergence of Epidemiology, Biostatistics, and Data Science
Abstract
Epidemiology, biostatistics, and data science are broad disciplines that incorporate a variety of substantive areas. Common among them is a focus on quantitative approaches for solving intricate problems. When the substantive area is health and health care, the overlap is further cemented. Researchers in these disciplines are fluent in statistics, data management and analysis, and health and medicine, to name but a few competencies. Yet there are important and perhaps mutually exclusive attributes of these fields that warrant a tighter integration. For example, epidemiologists receive substantial training in the science of study design, measurement, and the art of causal inference. Biostatisticians are well versed in the theory and application of methodological techniques, as well as the design and conduct of public health research. Data scientists receive equivalently rigorous training in computational and visualization approaches for high-dimensional data. Compared to data scientists, epidemiologists and biostatisticians may have less expertise in computer science and informatics, while data scientists may benefit from a working knowledge of study design and causal inference. Collaboration and cross-training offer the opportunity to share and learn of the constructs, frameworks, theories, and methods of these fields with the goal of offering fresh and innovate perspectives for tackling challenging problems in health and health care. In this article, we first describe the evolution of these fields focusing on their convergence in the era of electronic health data, notably electronic medical records (EMRs). Next we present how a collaborative team may design, analyze, and implement an EMR-based study. Finally, we review the curricula at leading epidemiology, biostatistics, and data science training programs, identifying gaps and offering suggestions for the fields moving forward.
Keywords: biostatistics; causal inference; data science; electronic medical records; epidemiology; study design; training and education.
Figures




Similar articles
-
Essential team science skills for biostatisticians on collaborative research teams.J Clin Transl Sci. 2023 Nov 6;7(1):e243. doi: 10.1017/cts.2023.676. eCollection 2023. J Clin Transl Sci. 2023. PMID: 38033706 Free PMC article.
-
Training the next generation of Biostatisticians in West Africa: The Vanderbilt Nigeria Biostatistics Training Program (VN-BioStat).J Glob Health Rep. 2023;7:e2023067. doi: 10.29392/001c.88939. Epub 2023 Oct 24. J Glob Health Rep. 2023. PMID: 38098733 Free PMC article.
-
Methods for training collaborative biostatisticians.J Clin Transl Sci. 2020 Aug 4;5(1):e26. doi: 10.1017/cts.2020.518. J Clin Transl Sci. 2020. PMID: 33948249 Free PMC article.
-
A retrospective and prospective study of biostatistics in Canada.Can J Public Health. 2024 Dec;115(6):839-843. doi: 10.17269/s41997-024-00866-w. Epub 2024 Mar 13. Can J Public Health. 2024. PMID: 38478215 Free PMC article. Review.
-
Using implementation science theories and frameworks in global health.BMJ Glob Health. 2020 Apr 16;5(4):e002269. doi: 10.1136/bmjgh-2019-002269. eCollection 2020. BMJ Glob Health. 2020. PMID: 32377405 Free PMC article. Review.
Cited by
-
How to write statistical analysis section in medical research.J Investig Med. 2022 Dec;70(8):1759-1770. doi: 10.1136/jim-2022-002479. Epub 2022 Jun 16. J Investig Med. 2022. PMID: 35710142 Free PMC article.
-
Beyond the Digital Competencies of Medical Students: Concerns over Integrating Data Science Basics into the Medical Curriculum.Int J Environ Res Public Health. 2022 Nov 30;19(23):15958. doi: 10.3390/ijerph192315958. Int J Environ Res Public Health. 2022. PMID: 36498065 Free PMC article.
-
The intersection of big data and epidemiology for epidemiologic research: The impact of the COVID-19 pandemic.Int J Qual Health Care. 2021 Sep 25;33(3):mzab134. doi: 10.1093/intqhc/mzab134. Int J Qual Health Care. 2021. PMID: 34508642 Free PMC article.
-
A collaborative approach to advancing research and training in Public Health Data Science-challenges, opportunities, and lessons learnt.Front Public Health. 2024 Dec 11;12:1474947. doi: 10.3389/fpubh.2024.1474947. eCollection 2024. Front Public Health. 2024. PMID: 39722718 Free PMC article.
-
Dual-stream algorithms for dementia detection: Harnessing structured and unstructured electronic health record data, a novel approach to prevalence estimation.Alzheimers Dement. 2025 May;21(5):e70132. doi: 10.1002/alz.70132. Alzheimers Dement. 2025. PMID: 40325920 Free PMC article.
References
-
- Angrist J, & Pischke JS (2008). Mostly harmless econometrics. Princeton University Press.
-
- Borgman CL (2019). The lives and after lives of data. Harvard Data Science Review, 1(1). 10.1162/99608f92.9a36bdb6 - DOI
-
- Celentano DD, & Szklo M (2018). Gordis epidemiology. Elsevier.
-
- Cleveland WS (2014). Data science: An action plan for expanding the technical areas of the field of statistics. Statistical Analysis and Data Mining, 7(6), 414–417. 10.1111/j.1751-5823.2001.tb00477.x - DOI
Grants and funding
LinkOut - more resources
Full Text Sources