Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics

Adam S Charles^{1

2}, Benjamin Falk¹, Nicholas Turner³, Talmo D Pereira⁴, Daniel Tward¹, Benjamin D Pedigo¹, Jaewon Chung¹, Randal Burns¹, Satrajit S Ghosh^{5

6}, Justus M Kebschull^{1

7}, William Silversmith⁴, Joshua T Vogelstein^{1

2}

Affiliations

¹ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA; email: adamsc@jhu.edu.
² Institute for Computational Medicine, Kavli Neuroscience Discovery Institute, and Center for Imaging Science, Johns Hopkins University, Baltimore, Maryland 21218, USA.
³ Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA.
⁴ Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08540, USA.
⁵ McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
⁶ Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, Massachusetts 02115, USA.
⁷ Stanford University, Palo Alto, California 94305, USA.

PMID: 32283996
PMCID: PMC9119703
DOI: 10.1146/annurev-neuro-100119-110036

Review

Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics

Adam S Charles et al. Annu Rev Neurosci. 2020.

. 2020 Jul 8:43:441-464.

doi: 10.1146/annurev-neuro-100119-110036. Epub 2020 Apr 13.

Authors

Affiliations

¹ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA; email: adamsc@jhu.edu.
² Institute for Computational Medicine, Kavli Neuroscience Discovery Institute, and Center for Imaging Science, Johns Hopkins University, Baltimore, Maryland 21218, USA.
³ Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA.
⁴ Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey 08540, USA.
⁵ McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
⁶ Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, Massachusetts 02115, USA.
⁷ Stanford University, Palo Alto, California 94305, USA.

PMID: 32283996
PMCID: PMC9119703
DOI: 10.1146/annurev-neuro-100119-110036

Abstract

As acquiring bigger data becomes easier in experimental brain science, computational and statistical brain science must achieve similar advances to fully capitalize on these data. Tackling these problems will benefit from a more explicit and concerted effort to work together. Specifically, brain science can be further democratized by harnessing the power of community-driven tools, which both are built by and benefit from many different people with different backgrounds and expertise. This perspective can be applied across modalities and scales and enables collaborations across previously siloed communities.

Keywords: computational; infrastructure; reference data; statistics.

PubMed Disclaimer

Figures

**Figure 1**
The big data deluge puts different pressure on different applications. At greater data sizes, more powerful systems are needed to operate in these ever–more challenging regimes. Most neuroscience data sets currently still reside at sizes computationally tractable on a single PC or, at worst, a single HPC node. All these modalities, however, are seeing a steady rise in data sizes. The methods that will enable neuroscientists to make use of these ever-richer data sets must be developed now.

**Figure 2**
Big data brain science is the result of ingenious advances in recording technology and large-scale collaborations (*left box*). To maximally utilize the resulting data, we must determine how to convert the data coming from these new experimental paradigms into statistical conclusions on scientific questions (*right box*).

**Figure 3**
Physiology pipelines across scales. Pipelines have been independently developed for different brain data to transform the raw data through semantic information extraction and into a plethora of statistical analysis results. The raw data (*left*) are typically preprocessed via registration to a common space (e.g., motion correction). Next, semantic information, e.g., the regions of interest, individual neural traces, or animal poses, is extracted from the data. These are the variables used in final hypothesis generation or estimation.

**Figure 4**
An example data pipeline for nanoscale anatomy. (a) A parallel chunk-processing motif used during processing. A large volume is broken into chunks, each of which is processed and merged. This involves shuttling data from cloud storage or other backends to a computational cluster and tracking process completion and handling failures. The chunk regions depicted here can be anisotropic (e.g., a few wide slices). Each task outside of ovals is handled by data system pipeline software. (b) Representation overview for an example serial section transmission electron microscopy pipeline, showing how the data system implements computational tasks. (c) Computational tasks exemplified on a small cutout of the open data set of Kasthuri et al. (2015).

See this image and copyright information in PMC

References

1. Andrews TS, Hemberg M. 2019. False signals induced by single-cell imputation. F1000Res. 7:1740 - PMC - PubMed
1. Arroyo J, Athreya A, Cape J, Chen G, Priebe CE, Vogelstein JT. 2019. Inference for multiple heterogeneous networks with a common invariant subspace. arXiv:1906.10026 [stat.ME] - PMC - PubMed
1. Athreya A, Fishkind DE, Tang M, Priebe CE, Park Y, et al. 2017. Statistical inference on random dot product graphs: a survey. J. Mach. Learn. Res 18(226):1–92
1. Au OK-C, Tai C-L, Chu H-K, Cohen-Or D, Lee T-Y. 2008. Skeleton extraction by mesh contraction. ACM Trans. Graph 27(3):1–10
1. Ba J, Caruana R. 2014. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems 27, ed. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, pp. 2654–62. San Diego, CA: NeurIPS

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics

Affiliations

Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials