Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 13:16:328.
doi: 10.1186/s12859-015-0761-3.

An automated real-time integration and interoperability framework for bioinformatics

Affiliations

An automated real-time integration and interoperability framework for bioinformatics

Pedro Lopes et al. BMC Bioinformatics. .

Abstract

Background: In recent years data integration has become an everyday undertaking for life sciences researchers. Aggregating and processing data from disparate sources, whether through specific developed software or via manual processes, is a common task for scientists. However, the scope and usability of the majority of current integration tools fail to deal with the fast growing and highly dynamic nature of biomedical data.

Results: In this work we introduce a reactive and event-driven framework that simplifies real-time data integration and interoperability. This platform facilitates otherwise difficult tasks, such as connecting heterogeneous services, indexing, linking and transferring data from distinct resources, or subscribing to notifications regarding the timeliness of dynamic data. For developers, the framework automates the deployment of integrative and interoperable bioinformatics applications, using atomic data storage for content change detection, and enabling agent-based intelligent extract, transform and load tasks.

Conclusions: This work bridges the gap between the growing number of services, accessing specific data sources or algorithms, and the growing number of users, performing simple integration tasks on a recurring basis, through a streamlined workspace available to researchers and developers alike.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Framework architecture highlighting the different system layers. a external Original Resources are accessed for data extraction; b local or distributed Agents poll Original Resources; c the internal Data store uses a relational database (PostgreSQL or MySQL) to store data and an object cache (Redis) for improved performance; d the application engine, is implemented in Ruby, with the Rails framework, and controls the entire application and its API; e the Postman applies the data extracted by the Agents to the Templates and executes the final delivery; f the external Destination Resources receive the data from the system.
Fig. 2
Fig. 2
Framework monitoring and integration sequence diagram. In addition to the listed steps, all actions are logged internally for auditing, error tracking and performance analysis. Two alternative pipelines can be executed: a distributed agents generate a different sequence from step 3, where FluxCapacitor mediates all interactions; b events data can be pushed directly into the platform, generating a new sequence starting at step 7.
Fig. 3
Fig. 3
Applying data transformations. Data from Original Resources (in CSV/TSV, XML, JSON or SQL) can be easily translated and transformed (into URL requests, files, SQL queries or emails) using the framework’s templates: a CSV data are automatically inserted into a SQL database; b data are extracted from a SQL query into a CSV file; c XML elements are extracted (using XPath) and sent to a web service via POST request.
Fig. 4
Fig. 4
Web interface for proposed platform prototype. This interface highlights the integration configuration for automating human variome integration. This integration features one agent (LOVD XML Agent) and one template (SQL variant). The former configures how to extract mutation data from LOVD API and the latter specifies the configuration for storing extracted data in a relational database.

Similar articles

  • Biowep: a workflow enactment portal for bioinformatics applications.
    Romano P, Bartocci E, Bertolini G, De Paoli F, Marra D, Mauri G, Merelli E, Milanesi L. Romano P, et al. BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-8-S1-S19. BMC Bioinformatics. 2007. PMID: 17430563 Free PMC article.
  • Serverless computing in omics data analysis and integration.
    Grzesik P, Augustyn DR, Wyciślik Ł, Mrozek D. Grzesik P, et al. Brief Bioinform. 2022 Jan 17;23(1):bbab349. doi: 10.1093/bib/bbab349. Brief Bioinform. 2022. PMID: 34505137 Free PMC article.
  • Intelligent client for integrating bioinformatics services.
    Navas-Delgado I, Rojano-Muñoz Mdel M, Ramírez S, Pérez AJ, Andrés León E, Aldana-Montes JF, Trelles O. Navas-Delgado I, et al. Bioinformatics. 2006 Jan 1;22(1):106-11. doi: 10.1093/bioinformatics/bti740. Epub 2005 Oct 27. Bioinformatics. 2006. PMID: 16257987
  • Automation of in-silico data analysis processes through workflow management systems.
    Romano P. Romano P. Brief Bioinform. 2008 Jan;9(1):57-68. doi: 10.1093/bib/bbm056. Epub 2007 Dec 2. Brief Bioinform. 2008. PMID: 18056132 Review.
  • Interoperability with Moby 1.0--it's better than sharing your toothbrush!
    BioMoby Consortium; Wilkinson MD, Senger M, Kawas E, Bruskiewich R, Gouzy J, Noirot C, Bardou P, Ng A, Haase D, Saiz Ede A, Wang D, Gibbons F, Gordon PM, Sensen CW, Carrasco JM, Fernández JM, Shen L, Links M, Ng M, Opushneva N, Neerincx PB, Leunissen JA, Ernst R, Twigger S, Usadel B, Good B, Wong Y, Stein L, Crosby W, Karlsson J, Royo R, Párraga I, Ramírez S, Gelpi JL, Trelles O, Pisano DG, Jimenez N, Kerhornou A, Rosset R, Zamacola L, Tarraga J, Huerta-Cepas J, Carazo JM, Dopazo J, Guigo R, Navarro A, Orozco M, Valencia A, Claros MG, Pérez AJ, Aldana J, Rojano M, Fernandez-Santa Cruz R, Navas I, Schiltz G, Farmer A, Gessler D, Schoof H, Groscurth A. BioMoby Consortium, et al. Brief Bioinform. 2008 May;9(3):220-31. doi: 10.1093/bib/bbn003. Epub 2008 Jan 31. Brief Bioinform. 2008. PMID: 18238804 Review.

Cited by

References

    1. Sascha S, Kurtz S. A New Efficient Data Structure for Storage and Retrieval of Multiple Biosequences. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):345–357. doi: 10.1109/TCBB.2011.146. - DOI - PubMed
    1. Iskar M, Zeller G, Zhao X-M, van Noort V, Bork P. Drug discovery in the age of systems biology: the rise of computational approaches for data integration. Curr Opin Biotechnol. 2012;23(4):609–616. doi: 10.1016/j.copbio.2011.11.010. - DOI - PubMed
    1. Thiam Yui C, Liang L, Jik Soon W, Husain W. A Survey on Data Integration in Bioinformatics. In: Abd Manaf A, Sahibuddin S, Ahmad R, Mohd Daud S, El-Qawasmeh E, editors. Informatics Engineering and Information Science. 254. Heidelberg: Springer Berlin; 2011. pp. 16–28.
    1. Darmont J, Boussaid O, Ralaivao J-C, Aouiche K. An architecture framework for complex data warehouses. arXiv preprint 2007. http://arxiv.org/abs/0707.1534.
    1. Blankenberg D, Johnson JE, Team TG, Taylor J, Nekrutenko A. Wrangling Galaxy’s reference data. Bioinformatics. 2014;30(13):1917–1919. doi: 10.1093/bioinformatics/btu119. - DOI - PMC - PubMed

Publication types

LinkOut - more resources