Large-scale extraction of brain connectivity from the neuroscientific literature

Renaud Richardet¹, Jean-Cédric Chappelier¹, Martin Telefont¹, Sean Hill¹

Affiliations

Affiliation

¹ Blue Brain Project, Brain Mind Institute and School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

PMID: 25609795
PMCID: PMC4426844
DOI: 10.1093/bioinformatics/btv025

Large-scale extraction of brain connectivity from the neuroscientific literature

Renaud Richardet et al. Bioinformatics. 2015.

. 2015 May 15;31(10):1640-7.

doi: 10.1093/bioinformatics/btv025. Epub 2015 Jan 20.

Authors

Renaud Richardet¹, Jean-Cédric Chappelier¹, Martin Telefont¹, Sean Hill¹

Affiliation

¹ Blue Brain Project, Brain Mind Institute and School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

PMID: 25609795
PMCID: PMC4426844
DOI: 10.1093/bioinformatics/btv025

Abstract

Motivation: In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630 216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity.

Results: NERs and connectivity extractors are evaluated against a manually annotated corpus. The complete in litero extraction models are also evaluated against in vivo connectivity data from ABA with an estimated precision of 78%. The resulting database contains over 4 million brain region mentions and over 100 000 (ABA) and 122 000 (BAMS) potential brain region connections. This database drastically accelerates connectivity literature review, by providing a centralized repository of connectivity data to neuroscientists.

PubMed Disclaimer

Figures

**Fig. 1.**
Overview of datasets, methods and models. Three named entity recognizers (NER) identify and normalize brain region mentions: *BAMS* and *ABA* (lexical-based) and *BraiNER* (machine learning-based). Three different extractors predict the connectivity probability of brain region co-occurrences: *Filters* takes a top–down filtering approach, *Kernel* is a machine learning-based classifier and *Rules* consists of hand-written extraction rules. Connectivity results are presented in a searchable web interface. In the future, feedback from the interface can be used to retrain the NERs and extractors for continuous model improvement

**Fig. 2.**
Number of extracted connections for the three extractors, on PubMed and full-text corpora using the ABA-SYN NER

**Fig. 3.**
Evaluation against AMBCA. AMBCA contains 16 954 distinct connected brain region pairs (AMBCA Pos) and 28 415 unconnected pairs (AMBCA Neg). Connectivity data extracted from the literature contain 7949 distinct connected brain region pairs (LIT), of which 904 are connected in AMBCA (LIT TP) and 261 are not connected in AMBCA (LIT TN)

**Fig. 4.**
Comparison of the inter-region connectivity matrices, renormalized between 0 (white) and 1 (blue). Rows and columns correspond to ABA brain regions. *Left:* connection matrix from AMBCA (ipsilateral), using ABA’s inter-region connectivity model, with values representing a combination of connection strength and statistical confidence [see Fig. 4a of Oh *et al.* (2014)]. *Middle:* same matrix from AMBCA, but symmetrized (connection directionality is ignored, since the NLP models do not extract directionality). *Right:* connection matrix from the results extracted from the literature (LIT) with values representing the number of extracted connectivity statements, weighted by the estimated precision of each connectivity extractor

See this image and copyright information in PMC

References

1. Bota M., Swanson L.W. (2008) BAMS neuroanatomical ontology: design and implementation. Front. Neuroinform. , 2, 2. - PMC - PubMed
1. Bowden D., Dubach M. (2003) NeuroNames 2002. Neuroinformatics , 1, 43–59. - PubMed
1. Burns G., et al. (2008) Intelligent approaches to mining the primary research literature: Techniques, systems, and examples. In: Computational Intelligence in Medical Informatics . Vol. 85 Springer, Berlin, pp. 17–50.
1. Campos D., et al. (2013) Gimli: open source and high-performance biomedical name recognition. BMC Bioinformatics , 14, 54. - PMC - PubMed
1. French L., Pavlidis P. (2012) Using text mining to link journal articles to neuroanatomical databases. J. Comp. Neurol. , 520, 1772–1783. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large-scale extraction of brain connectivity from the neuroscientific literature

Affiliation

Large-scale extraction of brain connectivity from the neuroscientific literature

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources