Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun;4(2):97-108.
doi: 10.1089/big.2015.0057.

Integration and Visualization of Translational Medicine Data for Better Understanding of Human Diseases

Affiliations

Integration and Visualization of Translational Medicine Data for Better Understanding of Human Diseases

Venkata Satagopam et al. Big Data. 2016 Jun.

Abstract

Translational medicine is a domain turning results of basic life science research into new tools and methods in a clinical environment, for example, as new diagnostics or therapies. Nowadays, the process of translation is supported by large amounts of heterogeneous data ranging from medical data to a whole range of -omics data. It is not only a great opportunity but also a great challenge, as translational medicine big data is difficult to integrate and analyze, and requires the involvement of biomedical experts for the data processing. We show here that visualization and interoperable workflows, combining multiple complex steps, can address at least parts of the challenge. In this article, we present an integrated workflow for exploring, analysis, and interpretation of translational medicine data in the context of human health. Three Web services-tranSMART, a Galaxy Server, and a MINERVA platform-are combined into one big data pipeline. Native visualization capabilities enable the biomedical experts to get a comprehensive overview and control over separate steps of the workflow. The capabilities of tranSMART enable a flexible filtering of multidimensional integrated data sets to create subsets suitable for downstream processing. A Galaxy Server offers visually aided construction of analytical pipelines, with the use of existing or custom components. A MINERVA platform supports the exploration of health and disease-related mechanisms in a contextualized analytical visualization system. We demonstrate the utility of our workflow by illustrating its subsequent steps using an existing data set, for which we propose a filtering scheme, an analytical pipeline, and a corresponding visualization of analytical results. The workflow is available as a sandbox environment, where readers can work with the described setup themselves. Overall, our work shows how visualization and interfacing of big data processing services facilitate exploration, analysis, and interpretation of translational medicine data.

Keywords: big data analytics; big data infrastructure design; data acquisition and cleaning; data integration; data mining; disease map.

PubMed Disclaimer

Figures

<b>FIG. 1.</b>
FIG. 1.
A workflow for big data analytics in translational medicine. Clinical and “omics” data are integrated in the tranSMART database, allowing their exploration and selection of relevant subsets for downstream analysis. Selected data set is automatically transferred to Galaxy Server as a source for user-defined analytical pipelines. Finally, the results of the analysis are automatically transferred to an associated knowledge repository hosted on MINERVA platform (here: PD map) and displayed on the visualized molecular interaction networks. PD, Parkinson's disease.
<b>FIG. 2.</b>
FIG. 2.
Cohort/subset definition based on the variables displayed in data tree. Two distinct subsets are defined based on the variables “disease state” and “gender.” In the left panel: data tree in tranSMART data set explorer. The data tree for GEO study GSE7621 following curation and loading to tranSMART is shown here. The data leafs correspond to the low- and high-dimensional data variable names. GEO, Gene Expression Omnibus.
<b>FIG. 3.</b>
FIG. 3.
Visually constructed data flow in the Galaxy Server comparing two cohorts from tranSMART.
<b>FIG. 4.</b>
FIG. 4.
Data visualization and analysis using PD map. (A) Differential gene expression data comparing postmortem brain tissues from male PD patients versus controls are displayed on the PD map (green, upregulated; red, downregulated). Pathways and processes of conspicuous areas (colored circle) could be identified using the pathway and compartment layout view of the PD map. Detailed view on deregulated genes that encode for proteins involved in dopamine metabolism, secretion, and recycling (B), on mitochondrial electron transport chain, in particular elements of complex I (C), and on microglia activation (D).

References

    1. Topol EJ. The big medical data miss: Challenges in establishing an open medical resource. Nat Rev Genet. 2015;16:253–254 - PubMed
    1. Bender E. Big data in biomedicine: 4 big questions. Nature. 2015;527:S1–9. - PubMed
    1. Regan K, Payne PRO. From molecules to patients: The clinical applications of translational bioinformatics. Yearb Med Inform. 2015;10:164–169 - PMC - PubMed
    1. Mardis ER. The $1,000 genome, the $100,000 analysis? Genome Med. 2010;2:8–4. - PMC - PubMed
    1. Costa FF. Big data in biomedicine. Drug Discov Today. 2014;19:433–440 - PubMed

LinkOut - more resources