Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015:2015:639021.
doi: 10.1155/2015/639021. Epub 2015 Jun 2.

Toward a Literature-Driven Definition of Big Data in Healthcare

Affiliations
Review

Toward a Literature-Driven Definition of Big Data in Healthcare

Emilie Baro et al. Biomed Res Int. 2015.

Abstract

Objective: The aim of this study was to provide a definition of big data in healthcare.

Methods: A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals (n) and the number of variables (p) for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed.

Results: A total of 196 papers were included. Big data can be defined as datasets with Log(n∗p) ≥ 7. Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues.

Conclusion: Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of the literature review.
Figure 2
Figure 2
Number of papers about big data in healthcare published per year (full years only).
Figure 3
Figure 3
Number of papers about big data in healthcare describing a dataset per year (full years only).
Figure 4
Figure 4
Log(np) per year of publication. The continuous line represents the linear regression (P = 0.34).
Figure 5
Figure 5
Representation of the probability density function of Log(n) for omics, medical specialties, public health, and all fields together.
Figure 6
Figure 6
Representation of the probability density function of Log(p) for omics, medical specialties, public health, and all fields together.
Figure 7
Figure 7
Representation of the probability density function of Log(np) for omics, medical specialties, public health, and all fields together.
Figure 8
Figure 8
Log(p) as a function of Log(n) for omics, medical specialties, and public health. Each pictogram stands for one paper.

References

    1. Zhang Z. Big data and clinical research: focusing on the area of critical care medicine in mainland China. Quantitative Imaging in Medicine and Surgery. 2014;4(5):426–429. - PMC - PubMed
    1. Li S., Kang L., Zhao X.-M. A survey on evolutionary algorithm based hybrid intelligence in bioinformatics. BioMed Research International. 2014;2014:8. doi: 10.1155/2014/362738.362738 - DOI - PMC - PubMed
    1. Sessler D. I. Big Data—and its contributions to peri-operative medicine. Anaesthesia. 2014;69(2):100–105. doi: 10.1111/anae.12537. - DOI - PubMed
    1. Margolis R., Derr L., Dunn M., et al. The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data. Journal of the American Medical Informatics Association. 2014;21(6):957–958. doi: 10.1136/amiajnl-2014-002974. - DOI - PMC - PubMed
    1. Zou Q., Wang Z., Guan X., Liu B., Wu Y., Lin Z. An approach for identifying cytokines based on a novel ensemble classifier. BioMed Research International. 2013;2013:11. doi: 10.1155/2013/686090.686090 - DOI - PMC - PubMed

LinkOut - more resources