Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Nov;17(11):1442-7.
doi: 10.1038/nn.3838.

Big data from small data: data-sharing in the 'long tail' of neuroscience

Affiliations
Review

Big data from small data: data-sharing in the 'long tail' of neuroscience

Adam R Ferguson et al. Nat Neurosci. 2014 Nov.

Abstract

The launch of the US BRAIN and European Human Brain Projects coincides with growing international efforts toward transparency and increased access to publicly funded research in the neurosciences. The need for data-sharing standards and neuroinformatics infrastructure is more pressing than ever. However, 'big science' efforts are not the only drivers of data-sharing needs, as neuroscientists across the full spectrum of research grapple with the overwhelming volume of data being generated daily and a scientific environment that is increasingly focused on collaboration. In this commentary, we consider the issue of sharing of the richly diverse and heterogeneous small data sets produced by individual neuroscientists, so-called long-tail data. We consider the utility of these data, the diversity of repositories and options available for sharing such data, and emerging best practices. We provide use cases in which aggregating and mining diverse long-tail data convert numerous small data sources into big data for improved knowledge about neuroscience-related disorders.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic illustration of long-tail data. Studies that have plotted data set size against the number of data sources reliably uncover a skewed distribution. Well-organized big science efforts featuring homogenous, well-organized data represent only a small proportion of the total data collected by scientists. A very large proportion of scientific data falls in the long-tail of the distribution, with numerous small independent research efforts yielding a rich variety of specialty research data sets. The extreme right portion of the long tail includes data that are unpublished; such as siloed databases, null findings, laboratory notes, animal care records, etc. These dark data hold a potential wealth of knowledge but are often inaccessible to the outside world.

References

    1. Huerta MF, Koslow SH, Leshner AI. Trends Neurosci. 1993;16:436–438. - PubMed
    1. Roysam B, Shain W, Ascoli GA. Neuroinformatics. 2009;7:1–5. - PMC - PubMed
    1. National Institutes of Health. NIH Program Announcement NOT-MH-05–014. 2005 http://grants.nih.gov/grants/guide/notice-files/NOT-MH-05-014.html.
    1. Shepherd GM, et al. Trends Neurosci. 1998;21:460–468. - PubMed
    1. Weinberg AM. Science. 1961;134:161–164. - PubMed

Publication types