Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 23:2014:bau059.
doi: 10.1093/database/bau059. Print 2014.

BioC implementations in Go, Perl, Python and Ruby

Affiliations

BioC implementations in Go, Perl, Python and Ruby

Wanli Liu et al. Database (Oxford). .

Abstract

As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
BioC workflow diagram.
Figure 2.
Figure 2.
Building BioC modules with SWIG (for Python).
Figure 3.
Figure 3.
Access C++ BioC class through target language proxy class wrapper interface.
Figure 4.
Figure 4.
Perl code accessing BioC data (tested with Perl 5.8.8).
Figure 5.
Figure 5.
Python code accessing BioC data (tested with Python 2.5.1).
Figure 6.
Figure 6.
PyBioC code accessing BioC data.
Figure 7.
Figure 7.
Go code accessing BioC data (tested with Go 1.1.2).
Figure 8.
Figure 8.
Ruby code accessing BioC data (tested with Ruby 2.0.0).

References

    1. Comeau D.C., Islamaj Doğan R., Ciccarese P., et al. . (2013) BioC: a minimalist approach to interoperability for biomedical text processing. Database (Oxford), 2013, bat064. - PMC - PubMed
    1. Stajich J.E., Block D., Boulez K., et al. . (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res., 12, 1611–1618 - PMC - PubMed
    1. Cock P.J., Antao T., Chang J.T., et al. . (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25, 1422–1423 - PMC - PubMed
    1. Goto N., Prins P., Nakao M., et al. (2010) BioRuby: bioinformatics software for the Ruby programming language. Bioinformatics, 26, 2617–2619 - PMC - PubMed
    1. Bonnal R.J., Aerts J., Githinji G., et al. . (2012) Biogem: an effective tool based approach for scaling up open source software development in bioinformatics. Bioinformatics, 28, 1035–1037 - PMC - PubMed

Publication types