Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug;7(4):342-6.
doi: 10.1111/cts.12178. Epub 2014 Jul 15.

Big data and large sample size: a cautionary note on the potential for bias

Affiliations

Big data and large sample size: a cautionary note on the potential for bias

Robert M Kaplan et al. Clin Transl Sci. 2014 Aug.

Abstract

A number of commentaries have suggested that large studies are more reliable than smaller studies and there is a growing interest in the analysis of "big data" that integrates information from many thousands of persons and/or different data sources. We consider a variety of biases that are likely in the era of big data, including sampling error, measurement error, multiple comparisons errors, aggregation error, and errors associated with the systematic exclusion of information. Using examples from epidemiology, health services research, studies on determinants of health, and clinical trials, we conclude that it is necessary to exercise greater caution to be sure that big sample size does not lead to big inferential errors. Despite the advantages of big studies, large sample size can magnify the bias associated with error resulting from sampling or study design.

Keywords: bias; big data; research methods; sampling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Effect sizes for two trials, both p = 0.05.

References

    1. Freedman D, Pisani R, Purves R. Statistics. 4th edn. New York: WW Norton; 2004.
    1. Lauer MS. Time for a creative transformation of epidemiology in the United States. JAMA. 2012; 308: 1804–1805. - PubMed
    1. Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005; 58: 323–337. - PubMed
    1. Selby JV, Krumholz HM, Kuntz RE, Collins FS. Network news: powering clinical research. Sci Transl Med 2013; 5: 182fs13. - PMC - PubMed
    1. Overhage JM, Overhage LM. Sensible use of observational clinical data. Stat Methods Med Res. 2013; 22: 7–13. - PubMed