Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 21:9:13.
doi: 10.3389/fninf.2015.00013. eCollection 2015.

Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application

Affiliations

Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application

Leon French et al. Front Neuroinform. .

Abstract

We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/.

Keywords: connectome; information retrieval; natural language processing; text mining.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Visualization of processing steps for an example sentence.
Figure 2
Figure 2
Flow chart depicting the origins and evaluations of the connectivity corpora. Arrows represent the use of annotated data from one corpus (source) to test or create a corpus (target). JCN, Journal of Comparative Neurology; BAMS, Brain Architecture Management System.
Figure 3
Figure 3
Screenshot of example results from WhiteText Web. The top text input field attempts to match typed text to brain regions in NIFSTD while the user types. The query region column shows the original named brain regions that were matched to the given input of “Habenula” or it’s children. Sentence text is directly linked to the source abstract in PubMed. Query and connected regions are colored, with underlines marking words that suggest connectivity. Results can be sorted by all columns except the first. A single click on the gray flag in the “Report” column allows users to mark sentences that were incorrectly parsed. The “Export Table” link (top left) provides a tab-separated file containing the returned results.
Figure 4
Figure 4
Bar plot of yearly counts of abstracts with connectivity information in the combined corpus.

References

    1. Ambert K. H., Cohen A. M. (2012). “Chapter Six-Text-Mining and neuroscience,” in International Review of Neurobiology, eds Chesler E. J., Haendel M. A. (Amsterdam: Academic Press; ), 109–132. Available online at: http://www.sciencedirect.com/science/article/pii/B978012388408400006X. [Accessed on February 21, 2013]. - PubMed
    1. Ambert K. H., Cohen A. M., Burns G. A. P. C., Boudreau E., Sonmez K. (2013). Virk: an active learning-based system for bootstrapping knowledge base development in the neurosciences. Front. Neuroinform. 7:38. 10.3389/fninf.2013.00038 - DOI - PMC - PubMed
    1. Arighi C. N., Lu Z., Krallinger M., Cohen K. B., Wilbur W. J., Valencia A., et al. . (2011). Overview of the BioCreative III workshop. BMC Bioinformatics 12(Suppl. 8):S1. 10.1186/1471-2105-12-S8-S1 - DOI - PMC - PubMed
    1. Bota M., Dong H.-W., Swanson L. W. (2012). Combining collation and annotation efforts toward completion of the rat and mouse connectomes in BAMS. Front. Neuroinform. 6:2. 10.3389/fninf.2012.00002 - DOI - PMC - PubMed
    1. Bota M., Sporns O., Swanson L. W. (2015). Architecture of the cerebral cortical association connectome underlying cognition. Proc. Natl. Acad. Sci. U S A 112, E2093–E2101. 10.1073/pnas.1504394112 - DOI - PMC - PubMed

LinkOut - more resources