A Bottom-up Approach to Data Annotation in Neurophysiology

Jan Grewe¹, Thomas Wachtler, Jan Benda

Affiliations

PMID: 21941477
PMCID: PMC3171061
DOI: 10.3389/fninf.2011.00016

A Bottom-up Approach to Data Annotation in Neurophysiology

Jan Grewe et al. Front Neuroinform. 2011.

. 2011 Aug 30:5:16.

doi: 10.3389/fninf.2011.00016. eCollection 2011.

Authors

Jan Grewe¹, Thomas Wachtler, Jan Benda

Affiliation

¹ Department Biology II, Ludwig-Maximilians Universität München Martinsried, Germany.

PMID: 21941477
PMCID: PMC3171061
DOI: 10.3389/fninf.2011.00016

Abstract

Metadata providing information about the stimulus, data acquisition, and experimental conditions are indispensable for the analysis and management of experimental data within a lab. However, only rarely are metadata available in a structured, comprehensive, and machine-readable form. This poses a severe problem for finding and retrieving data, both in the laboratory and on the various emerging public data bases. Here, we propose a simple format, the "open metaData Markup Language" (odML), for collecting and exchanging metadata in an automated, computer-based fashion. In odML arbitrary metadata information is stored as extended key-value pairs in a hierarchical structure. Central to odML is a clear separation of format and content, i.e., neither keys nor values are defined by the format. This makes odML flexible enough for storing all available metadata instantly without the necessity to submit new keys to an ontology or controlled terminology. Common standard keys can be defined in odML-terminologies for guaranteeing interoperability. We started to define such terminologies for neurophysiological data, but aim at a community driven extension and refinement of the proposed definitions. By customized terminologies that map to these standard terminologies, metadata can be named and organized as required or preferred without softening the standard. Together with the respective libraries provided for common programming languages, the odML format can be integrated into the laboratory workflow, facilitating automated collection of metadata information where it becomes available. The flexibility of odML also encourages a community driven collection and definition of terms used for annotating data in the neurosciences.

Keywords: datamodel; datasharing; metadata; neuroscience; ontology.

PubMed Disclaimer

Figures

**Figure 1**
**The flow of data and metadata in sciences**. The basis of this “food chain,” on top, is the laboratory in which the data is originally recorded, stored, managed and analyzed. Here metadata are important in many respects. Data management uses them to categorize and organize the data, during data analysis stimulus information is required and further, derived, data characteristics are added which again may be useful for querying data, etc. Data may further be shared with collaborators for discussion and re-evaluation. Eventually, data may be made available via public databases like the G-Node (Herz et al., 2008). On all levels data exchange between people as well as computer programs requires a detailed annotation of the raw data with metadata.

**Figure 2**
**Open metaData Markup Language Entity-Relation diagram**. The odML model is a tree structure of *Sections* and *Properties*. Connecting lines and “crow's feet” indicate the relationship between the entities. For example: a *Section* can contain 0 to many (n) *Properties* which in turn must have at least 1 *Value*. The recursive connection of the *Section* indicates that there can be 0 to many subsections building the tree. All is embraced by a *RootSection* that contains some document-related elements. All elements listed in the different entities may at maximum occur once.

**Figure 3**
**Hardware descriptions in odML**. Hardware descriptions can be split up into the *HardwareProperties* and *HardwareSettings*. These container sections then group subsections for the individual hardware items used in the setup. Sections are shown in the form “name – [type].”

**Figure 4**
**Describing a stimulus in odML**. odML description of a visual stimulus which is an additive combination of three components. The trace on top shows how the actual stimulus might have looked like. Sections are shown in the form “name – [type].”

**Figure 5**
**Transporting dataset information in odML**. **(A)** Parts of the description of a simple electrophysiological experiment in which a single cell was recorded and several datasets were saved to disk. **(B)** Experiments in which several datasets have been recorded in a number of cells from the same subject. **(C)** Description of simultaneous recordings of two cells. Note: For clarity Properties are omitted in **(B,C)**. Sections are shown in the form “name – [type].”

**Figure 6**
**Using mappings**. This figure shows how mappings can be applied to convert a metadata tree from one layout to another. The left panel shows metadata that are organized as suggested by the CARMEN “Mini” metadata standard. The metadata file is in the odML format and refers to the CarmenMini terminology which defines mappings for properties and sections. These are URLs to the respective properties in the odML-terminologies. Applying this mapping information converts the tree to the layout suggested by the odML-terminologies (right panel).

**Listing 1**
**Using odML in Matlab**. Example code shows how odML could be used during everyday work in the lab. The listing shows Matlab command line calls.

**Listing 2**
**Dummy Matlab function “powerSpectrum.m” to illustrate how metadata can be retrieved and used during data analysis**.

**Figure A1**
**The odML schema**. XML-schema definition of the odML format. This schema can be used to validate odML files, i.e., check their structural conformity. Note that XML is case-sensitive. This means that the tags (“property,” “section,” “name,” etc.) have to be written as defined in this schema. In our schema all tags use the “lower camelCase” or “compoundNames” which is lower case except for the first letter of subsequent words in composite terms.

See this image and copyright information in PMC

References

1. Amari S.-I., Beltrame F., Bjaalie J. G., Dalkara T., De Schutter E., Egan G. F., Goddard N. H., Gonzalez C., Grillner S., Herz A., Hoffmann K.-P., Jaaskelainen I., Koslow S. H., Lee S.-Y., Matthiessen L., Miller P. L., Da Silva F. M., Novak M., Ravindranath V., Ritz R., Ruotsalainen U., Sebestra V., Subramaniam S., Tang Y., Toga A. W., Usui S., Van Pelt J., Verschure P., Willshaw D., Wrobel A. (2002). Neuroinformatics: the integration of shared databases and tools towards integrative neuroscience. J. Integr. Neurosci. 1, 117–128 10.1142/S0219635202000128 - DOI - PubMed
1. Bezgin G., Reid A. T., Schubert D., Kötter R. (2009). Matching spatial with ontological brain regions using java tools for visualization, database access, and integrated data analysis. Neuroinformatics 7, 7–22 10.1007/s12021-008-9039-5 - DOI - PubMed
1. Bowden D. M., Dubach M., Park J. (2007). Creating neuroscience ontologies. Methods Mol. Biol. 401, 67–87 - PubMed
1. Bowden D. M., Dubach M. F. (2003). Neuronames 2002. Neuroinformatics 1, 43–59 10.1385/NI:1:1:043 - DOI - PubMed
1. Bug W. J., Ascoli G. A., Grethe J. S., Gupta A., Fennema-Notestine C., Laird A. R., Larson S. D., Rubin D., Shepherd G. M., Turner J. A., Martone M. E. (2008). The nifstd and birnlex vocabularies: building comprehensive ontologies for neuroscience. Neuroinformatics 6, 175–194 10.1007/s12021-008-9032-z - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Bottom-up Approach to Data Annotation in Neurophysiology

Affiliation

A Bottom-up Approach to Data Annotation in Neurophysiology

Authors

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Miscellaneous