Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications

Toshiaki Katayama et al. J Biomed Semantics. .

Abstract

Background: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009.

Results: Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs.

Conclusions: Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Attendees of the DBCLS BioHackathon 2009. The BioHackathon 2009 was attended by representatives from projects in Web services, Text Mining, Visualization and Workflow development, in addition to genome biologists who provided real-world use cases from their research.
Figure 2
Figure 2
Workflow to annotate large sets of ESTs. Sequences are firstly annotated using high-throughput systems (e.g. Blast2GO, KAAS). Remaining difficult-to-annotate sequences are subsequently passed through ANNOTATOR for deeper analysis. The combined sequences are then joined with related annotations in the remote Ensembl database using BioMart and exposed through TogoDB such that they can be consumed by workflow managers (e.g. jORCA or Taverna) as TogoWS services.
Figure 3
Figure 3
System to enrich TFBSs with differential expression data. Data on transcriptional start sites and on functional element SNPs are combined using distributed annotation system (DAS) protocol layers for the DBTSS and FESD II databases, respectively. Providing a list of genes or proteins (e.g. gene expression data), enrichment can then be computed and exposed using a DAS viewer.
Figure 4
Figure 4
Workflow to analyze protein interactions among enzymes in a KEGG pathway. First, protein sequences are retrieved for each enzyme in a KEGG pathway. The sequences are then BLAST searched against UniProt and a phylogenetic profile is constructed of the results. Then, for each species in the phylogenetic profile, BLAST searches are run against PDB. Pairs of protein sequences (of the same species) that have homologs in the same PDB entry are inferred to be in physical contact and hence predicted to be interacting. Conserved and interacting proteins are then visualized on the pathway map, an example of which is shown in Figure 5.
Figure 5
Figure 5
Evolutionary conservation rate of proteins on a KEGG pathway. Evolutionary conservation rate is defined as the ratio of the number of conserved proteins, i.e. homologs, over the number of species. Conservation rate is color-coded for each node in the pathway (see legend). See text and Figure 4 for more details.
Figure 6
Figure 6
Workflow for analyzing glyco-gene-related diseases. In the first step of this workflow, GlycoEpitope DB entries are searched for disease-related keywords by a newly developed BioMoby service called getGlycoEpitopeIDfromKeyword. The identifiers of matching entries are then used to retrieve glycan structures in IUPAC format by another newly developed BioMoby service called getIUPACfromGlycoEpitopeID. The resulting IUPAC glycans are subsequently converted to KCF format by a new RINGS service called getKCFfromIUPAC. The KCF glycans can then be used for querying other RINGS data mining services.
Figure 7
Figure 7
Connectivity and compatibility of participating projects. The BioHackathon 2009 was attended by participants representing projects operating in a number of problem domains (shown in Figure 1). Analysis of these participating projects during the hackathon revealed compatibilities and resulting connectivity as shown here.

References

    1. Database Center for Life Science. http://dbcls.rois.ac.jp/
    1. Katayama T, Arakawa K, Nakao M, Ono K, Aoki-Kinoshita K, Yamamoto Y, Yamaguchi A, Kawashima S, Chun H-W, Aerts J. et al.The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. Journal of biomedical semantics. 2010;1:8. doi: 10.1186/2041-1480-1-8. - DOI - PMC - PubMed
    1. Okinawa Institute of Science and Technology. http://www.oist.jp/
    1. Kwon Y, Shigemoto Y, Kuwana Y, Sugawara H. Web API for biology with a workflow navigation system. Nucleic Acids Research. 2009;37:W11–W16. doi: 10.1093/nar/gkp300. - DOI - PMC - PubMed
    1. Web API for Biology (WABI) http://xml.nig.ac.jp/index.html