Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 14;10(1):63.
doi: 10.1186/s13321-018-0319-2.

SIA: a scalable interoperable annotation server for biomedical named entities

Affiliations

SIA: a scalable interoperable annotation server for biomedical named entities

Johannes Kirschnick et al. J Cheminform. .

Abstract

Recent years showed a strong increase in biomedical sciences and an inherent increase in publication volume. Extraction of specific information from these sources requires highly sophisticated text mining and information extraction tools. However, the integration of freely available tools into customized workflows is often cumbersome and difficult. We describe SIA (Scalable Interoperable Annotation Server), our contribution to the BeCalm-Technical interoperability and performance of annotation servers (BeCalm-TIPS) task, a scalable, extensible, and robust annotation service. The system currently covers six named entity types (i.e., chemicals, diseases, genes, miRNA, mutations, and organisms) and is freely available under Apache 2.0 license at https://github.com/Erechtheus/sia .

Keywords: Annotation service; Extensibility; Robustness; Scalability; Text mining.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
General architecture of SIA. The front end handles new requests and forwards them to the back end over a message bus. Each message is transformed through a series of components, which in turn are connected via named queues. The result handler collects the annotation responses and returns them to the calling client
Fig. 2
Fig. 2
Processing statistics over a four week period and request times per corpus, reporting complete processing and annotation timings separately

References

    1. Hunter L, Cohen KB. Biomedical language processing: what’s beyond pubmed? Mol Cell. 2006;21(5):589–594. doi: 10.1016/j.molcel.2006.02.012. - DOI - PMC - PubMed
    1. Rheinländer A, Lehmann M, Kunkel A, Meier J, Leser U (2016) Potential and pitfalls of domain-specific information extraction at web scale. In: Proceedings of the 2016 international conference on management of data, pp 759–771. 10.1145/2882903.2903736
    1. Thomas P, Starlinger J, Leser U (2013) Experiences from developing the domain-specific entity search engine GeneView. In: Proceedings of Datenbanksysteme Für Business, Technologie und Web, pp 225–239
    1. Comeau DC, Doğan RI, Ciccarese P, Cohen KB, Krallinger M, Leitner F, Lu Z, Peng Y, Rinaldi F, Torii M, et al. Bioc: a minimalist approach to interoperability for biomedical text processing. Database. 2013;18:bat064. - PMC - PubMed
    1. Pérez-Pérez M, Pérez-Rodríguez G, Blanco-Míguez A, Fdez-Riverola F, Valencia A, Krallinger M, Lourenco A (2017) Benchmarking biomedical text mining web servers at BioCreative V.5: the technical interoperability and performance of annotation servers—TIPS track. In: Proceedings of the BioCreative V.5 challenge evaluation workshop, pp 12–21

LinkOut - more resources