. 2015 Oct 13:16:328.

doi: 10.1186/s12859-015-0761-3.

An automated real-time integration and interoperability framework for bioinformatics

Pedro Lopes¹, José Luís Oliveira²

Affiliations

¹ DETI/IEETA, Universidade de Aveiro, Campus Universitario de Santiago, Aveiro, 3810-193, Portugal. pedrolopes@ua.pt.
² DETI/IEETA, Universidade de Aveiro, Campus Universitario de Santiago, Aveiro, 3810-193, Portugal. jlo@ua.pt.

PMID: 26464306
PMCID: PMC4603302
DOI: 10.1186/s12859-015-0761-3

An automated real-time integration and interoperability framework for bioinformatics

Pedro Lopes et al. BMC Bioinformatics. 2015.

. 2015 Oct 13:16:328.

doi: 10.1186/s12859-015-0761-3.

Authors

Pedro Lopes¹, José Luís Oliveira²

Affiliations

¹ DETI/IEETA, Universidade de Aveiro, Campus Universitario de Santiago, Aveiro, 3810-193, Portugal. pedrolopes@ua.pt.
² DETI/IEETA, Universidade de Aveiro, Campus Universitario de Santiago, Aveiro, 3810-193, Portugal. jlo@ua.pt.

PMID: 26464306
PMCID: PMC4603302
DOI: 10.1186/s12859-015-0761-3

Abstract

Background: In recent years data integration has become an everyday undertaking for life sciences researchers. Aggregating and processing data from disparate sources, whether through specific developed software or via manual processes, is a common task for scientists. However, the scope and usability of the majority of current integration tools fail to deal with the fast growing and highly dynamic nature of biomedical data.

Results: In this work we introduce a reactive and event-driven framework that simplifies real-time data integration and interoperability. This platform facilitates otherwise difficult tasks, such as connecting heterogeneous services, indexing, linking and transferring data from distinct resources, or subscribing to notifications regarding the timeliness of dynamic data. For developers, the framework automates the deployment of integrative and interoperable bioinformatics applications, using atomic data storage for content change detection, and enabling agent-based intelligent extract, transform and load tasks.

Conclusions: This work bridges the gap between the growing number of services, accessing specific data sources or algorithms, and the growing number of users, performing simple integration tasks on a recurring basis, through a streamlined workspace available to researchers and developers alike.

PubMed Disclaimer

Figures

**Fig. 1**
Framework architecture highlighting the different system layers. a external Original Resources are accessed for data extraction; b local or distributed Agents poll Original Resources; c the internal Data store uses a relational database (PostgreSQL or MySQL) to store data and an object cache (Redis) for improved performance; d the application engine, is implemented in Ruby, with the Rails framework, and controls the entire application and its API; e the Postman applies the data extracted by the Agents to the Templates and executes the final delivery; f the external Destination Resources receive the data from the system.

**Fig. 2**
Framework monitoring and integration sequence diagram. In addition to the listed steps, all actions are logged internally for auditing, error tracking and performance analysis. Two alternative pipelines can be executed: a distributed agents generate a different sequence from step 3, where FluxCapacitor mediates all interactions; b events data can be pushed directly into the platform, generating a new sequence starting at step 7.

**Fig. 3**
Applying data transformations. Data from Original Resources (in CSV/TSV, XML, JSON or SQL) can be easily translated and transformed (into URL requests, files, SQL queries or emails) using the framework’s templates: a CSV data are automatically inserted into a SQL database; b data are extracted from a SQL query into a CSV file; c XML elements are extracted (using XPath) and sent to a web service via POST request.

**Fig. 4**
Web interface for proposed platform prototype. This interface highlights the integration configuration for automating human variome integration. This integration features one agent (LOVD XML Agent) and one template (SQL variant). The former configures how to extract mutation data from LOVD API and the latter specifies the configuration for storing extracted data in a relational database.

See this image and copyright information in PMC

Cited by

TASKA: A modular task management system to support health research studies.
Almeida JR, Gini R, Roberto G, Rijnbeek P, Oliveira JL. Almeida JR, et al. BMC Med Inform Decis Mak. 2019 Jul 2;19(1):121. doi: 10.1186/s12911-019-0844-6. BMC Med Inform Decis Mak. 2019. PMID: 31266480 Free PMC article.

References

1. Sascha S, Kurtz S. A New Efficient Data Structure for Storage and Retrieval of Multiple Biosequences. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):345–357. doi: 10.1109/TCBB.2011.146. - DOI - PubMed
1. Iskar M, Zeller G, Zhao X-M, van Noort V, Bork P. Drug discovery in the age of systems biology: the rise of computational approaches for data integration. Curr Opin Biotechnol. 2012;23(4):609–616. doi: 10.1016/j.copbio.2011.11.010. - DOI - PubMed
1. Thiam Yui C, Liang L, Jik Soon W, Husain W. A Survey on Data Integration in Bioinformatics. In: Abd Manaf A, Sahibuddin S, Ahmad R, Mohd Daud S, El-Qawasmeh E, editors. Informatics Engineering and Information Science. 254. Heidelberg: Springer Berlin; 2011. pp. 16–28.
1. Darmont J, Boussaid O, Ralaivao J-C, Aouiche K. An architecture framework for complex data warehouses. arXiv preprint 2007. http://arxiv.org/abs/0707.1534.
1. Blankenberg D, Johnson JE, Team TG, Taylor J, Nekrutenko A. Wrangling Galaxy’s reference data. Bioinformatics. 2014;30(13):1917–1919. doi: 10.1093/bioinformatics/btu119. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An automated real-time integration and interoperability framework for bioinformatics

Affiliations

An automated real-time integration and interoperability framework for bioinformatics

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous