Big Data in Public Health: Terminology, Machine Learning, and Privacy
- PMID: 29261408
- PMCID: PMC6394411
- DOI: 10.1146/annurev-publhealth-040617-014208
Big Data in Public Health: Terminology, Machine Learning, and Privacy
Abstract
The digital world is generating data at a staggering and still increasing rate. While these "big data" have unlocked novel opportunities to understand public health, they hold still greater potential for research and practice. This review explores several key issues that have arisen around big data. First, we propose a taxonomy of sources of big data to clarify terminology and identify threads common across some subtypes of big data. Next, we consider common public health research and practice uses for big data, including surveillance, hypothesis-generating research, and causal inference, while exploring the role that machine learning may play in each use. We then consider the ethical implications of the big data revolution with particular emphasis on maintaining appropriate care for privacy in a world in which technology is rapidly changing social norms regarding the need for (and even the meaning of) privacy. Finally, we make suggestions regarding structuring teams and training to succeed in working with big data in research and practice.
Keywords: big data; machine learning; privacy; public health; training.
Figures
References
-
- Alaa AM, van der Schaar M. 2017. Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes. arXiv preprint arXiv:1704.02801
-
- Anderson TK. 2009. Kernel density estimation and K-means clustering to profile road accident hotspots. Accident; analysis and prevention 41:359–64 - PubMed
-
- Anderson TK. 2009. Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Analysis & Prevention 41:359–64 - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
