OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster
- PMID: 20382263
- DOI: 10.1016/j.jbi.2010.04.004
OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster
Abstract
Motivation: Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure.
Results: We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics.
Availability: The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.
Similar articles
-
A Chado case study: an ontology-based modular schema for representing genome-associated biological information.Bioinformatics. 2007 Jul 1;23(13):i337-46. doi: 10.1093/bioinformatics/btm189. Bioinformatics. 2007. PMID: 17646315
-
FlyTED: the Drosophila Testis Gene Expression Database.Nucleic Acids Res. 2010 Jan;38(Database issue):D710-5. doi: 10.1093/nar/gkp1006. Epub 2009 Nov 24. Nucleic Acids Res. 2010. PMID: 19934263 Free PMC article.
-
FlyBase : a database for the Drosophila research community.Methods Mol Biol. 2008;420:45-59. doi: 10.1007/978-1-59745-583-1_3. Methods Mol Biol. 2008. PMID: 18641940 Review.
-
Semantic-JSON: a lightweight web service interface for Semantic Web contents integrating multiple life science databases.Nucleic Acids Res. 2011 Jul;39(Web Server issue):W533-40. doi: 10.1093/nar/gkr353. Epub 2011 Jun 1. Nucleic Acids Res. 2011. PMID: 21632604 Free PMC article.
-
FlyBase 2.0: the next generation.Nucleic Acids Res. 2019 Jan 8;47(D1):D759-D765. doi: 10.1093/nar/gky1003. Nucleic Acids Res. 2019. PMID: 30364959 Free PMC article. Review.
Cited by
-
Publishing Chinese medicine knowledge as Linked Data on the Web.Chin Med. 2010 Jul 27;5:27. doi: 10.1186/1749-8546-5-27. Chin Med. 2010. PMID: 20663193 Free PMC article.
-
Functional Requirements for Medical Data Integration into Knowledge Management Environments: Requirements Elicitation Approach Based on Systematic Literature Analysis.J Med Internet Res. 2023 Feb 9;25:e41344. doi: 10.2196/41344. J Med Internet Res. 2023. PMID: 36757764 Free PMC article.
-
WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata.Database (Oxford). 2017 Jan 1;2017(1):bax025. doi: 10.1093/database/bax025. Database (Oxford). 2017. PMID: 28365742 Free PMC article.
-
Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases.BMC Bioinformatics. 2013 Apr 15;14:126. doi: 10.1186/1471-2105-14-126. BMC Bioinformatics. 2013. PMID: 23586394 Free PMC article.
-
Towards linked open gene mutations data.BMC Bioinformatics. 2012 Mar 28;13 Suppl 4(Suppl 4):S7. doi: 10.1186/1471-2105-13-S4-S7. BMC Bioinformatics. 2012. PMID: 22536974 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases