Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep 7:3:30.
doi: 10.3389/neuro.11.030.2009. eCollection 2009.

Derived Data Storage and Exchange Workflow for Large-Scale Neuroimaging Analyses on the BIRN Grid

Affiliations

Derived Data Storage and Exchange Workflow for Large-Scale Neuroimaging Analyses on the BIRN Grid

David B Keator et al. Front Neuroinform. .

Abstract

Organizing and annotating biomedical data in structured ways has gained much interest and focus in the last 30 years. Driven by decreases in digital storage costs and advances in genetics sequencing, imaging, electronic data collection, and microarray technologies, data is being collected at an ever increasing rate. The need to store and exchange data in meaningful ways in support of data analysis, hypothesis testing and future collaborative use is pervasive. Because trans-disciplinary projects rely on effective use of data from many domains, there is a genuine interest in informatics community on how best to store and combine this data while maintaining a high level of data quality and documentation. The difficulties in sharing and combining raw data become amplified after post-processing and/or data analysis in which the new dataset of interest is a function of the original data and may have been collected by multiple collaborating sites. Simple meta-data, documenting which subject and version of data were used for a particular analysis, becomes complicated by the heterogeneity of the collecting sites yet is critically important to the interpretation and reuse of derived results. This manuscript will present a case study of using the XML-Based Clinical Experiment Data Exchange (XCEDE) schema and the Human Imaging Database (HID) in the Biomedical Informatics Research Network's (BIRN) distributed environment to document and exchange derived data. The discussion includes an overview of the data structures used in both the XML and the database representations, insight into the design considerations, and the extensibility of the design to support additional analysis streams.

Keywords: BIRN; HID; MRI; XCEDE; XML; analysis; database; medical imaging.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of original (raw) data entry and file registration process at a node in the federation and FBIRN node locations.
Figure 2
Figure 2
FBIRN data management discovery and analysis workflow.
Figure 3
Figure 3
Original data files (blue) downloaded from data federation are processed using autorecon-all and cortical thickness data extracted (green). Resulting data files are loaded back into data federation.
Figure 4
Figure 4
fMRI PreProc data preprocessing workflow. Original data inputs colored blue, intermediate derived data colored yellow, final derived data output colored green. Workflow transform modules Fips-mc-fsl, Fips-b0c-fsl, Fips-stc-fsl, and Fips-sm2-fsl contains variable numbers of sub-transform steps to produced intermediate and final derived results.
Figure 5
Figure 5
Base <analysis> component.
Figure 6
Figure 6
<analysis_t> components of the XCEDE2 schema. The input/output components (panel A) used to reference input data and output derived data files and/or metadata. The measurement group component (panel B) used to store derived data values directly in XML formatted file. The provenance and processStep components (panel C) used for documenting processing pipeline specific metadata.
Figure 7
Figure 7
XCEDE2 XML entry for thickness and curvature derived data. Entity tags document terminology source “rh.aparc.annot” and term “caudalmiddlefrontal” which is the native term and source within FreeSurfer analysis software.
Figure 8
Figure 8
Example XCEDE2 provenance blocks from PreProc (top) analysis and StructMorph (bottom) analyses.
Figure 9
Figure 9
Core HID tables for defining processing pipelines.
Figure 10
Figure 10
HID web interface derived data query form for StructMorph analysis.

References

    1. Arzberger P., Finholt T. A. (2002). Data and collaboratories in the biomedical community. In Report of a Panel of Experts Meeting, September 16–18, 2002, Ballston, VA
    1. Fissell K., Tseytlin E., Cunningham D., Iyer K., Carter C. S., Schneider W., Cohen J. D. (2003). Fiswidgets: a graphical computing environment for neuroimaging analysis. Neuroinformatics 1, 111–12510.1385/NI:1:1:111 - DOI - PubMed
    1. Ford J. M., Roach B. J., Jorgensen K. W., Turner J. A., Brown G. G., Notestine R., Bischoff-Grethe A., Greve D., Wible C., Lauriello J., Belger A., Mueller B. A., Calhoun V., Preda A., Keator D., O'Leary D. S., Lim K. O., Glover G., Potkin S. G., Mathalon D. H. (2009). Tuning in to the voices: a multisite FMRI study of auditory hallucinations. Schizophr. Bull. 35, 58–6610.1093/schbul/sbn140 - DOI - PMC - PubMed
    1. Foster I., Vockler J., Wilde M., Zhao Y. (2003). The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration. silomar, CA, Proceedings of the Conference on Innovative Data Systems Research
    1. Freire J., Santos D., Silva E. (2008). Provenance for computational tasks: a survey. Comput. Sci. Eng. 10, 11–2110.1109/MCSE.2008.79 - DOI

LinkOut - more resources