Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Jul 8:43:441-464.
doi: 10.1146/annurev-neuro-100119-110036. Epub 2020 Apr 13.

Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics

Affiliations
Review

Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics

Adam S Charles et al. Annu Rev Neurosci. .

Abstract

As acquiring bigger data becomes easier in experimental brain science, computational and statistical brain science must achieve similar advances to fully capitalize on these data. Tackling these problems will benefit from a more explicit and concerted effort to work together. Specifically, brain science can be further democratized by harnessing the power of community-driven tools, which both are built by and benefit from many different people with different backgrounds and expertise. This perspective can be applied across modalities and scales and enables collaborations across previously siloed communities.

Keywords: computational; infrastructure; reference data; statistics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The big data deluge puts different pressure on different applications. At greater data sizes, more powerful systems are needed to operate in these ever–more challenging regimes. Most neuroscience data sets currently still reside at sizes computationally tractable on a single PC or, at worst, a single HPC node. All these modalities, however, are seeing a steady rise in data sizes. The methods that will enable neuroscientists to make use of these ever-richer data sets must be developed now.
Figure 2
Figure 2
Big data brain science is the result of ingenious advances in recording technology and large-scale collaborations (left box). To maximally utilize the resulting data, we must determine how to convert the data coming from these new experimental paradigms into statistical conclusions on scientific questions (right box).
Figure 3
Figure 3
Physiology pipelines across scales. Pipelines have been independently developed for different brain data to transform the raw data through semantic information extraction and into a plethora of statistical analysis results. The raw data (left) are typically preprocessed via registration to a common space (e.g., motion correction). Next, semantic information, e.g., the regions of interest, individual neural traces, or animal poses, is extracted from the data. These are the variables used in final hypothesis generation or estimation.
Figure 4
Figure 4
An example data pipeline for nanoscale anatomy. (a) A parallel chunk-processing motif used during processing. A large volume is broken into chunks, each of which is processed and merged. This involves shuttling data from cloud storage or other backends to a computational cluster and tracking process completion and handling failures. The chunk regions depicted here can be anisotropic (e.g., a few wide slices). Each task outside of ovals is handled by data system pipeline software. (b) Representation overview for an example serial section transmission electron microscopy pipeline, showing how the data system implements computational tasks. (c) Computational tasks exemplified on a small cutout of the open data set of Kasthuri et al. (2015).

References

    1. Andrews TS, Hemberg M. 2019. False signals induced by single-cell imputation. F1000Res. 7:1740 - PMC - PubMed
    1. Arroyo J, Athreya A, Cape J, Chen G, Priebe CE, Vogelstein JT. 2019. Inference for multiple heterogeneous networks with a common invariant subspace. arXiv:1906.10026 [stat.ME] - PMC - PubMed
    1. Athreya A, Fishkind DE, Tang M, Priebe CE, Park Y, et al. 2017. Statistical inference on random dot product graphs: a survey. J. Mach. Learn. Res 18(226):1–92
    1. Au OK-C, Tai C-L, Chu H-K, Cohen-Or D, Lee T-Y. 2008. Skeleton extraction by mesh contraction. ACM Trans. Graph 27(3):1–10
    1. Ba J, Caruana R. 2014. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems 27, ed. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, pp. 2654–62. San Diego, CA: NeurIPS

Publication types