Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 6;19(1):405.
doi: 10.1186/s12859-018-2429-2.

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes

Affiliations

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes

Andon Tchechmedjiev et al. BMC Bioinformatics. .

Abstract

Background: Despite a wide adoption of English in science, a significant amount of biomedical data are produced in other languages, such as French. Yet a majority of natural language processing or semantic tools as well as domain terminologies or ontologies are only available in English, and cannot be readily applied to other languages, due to fundamental linguistic differences. However, semantic resources are required to design semantic indexes and transform biomedical (text)data into knowledge for better information mining and retrieval.

Results: We present the SIFR Annotator ( http://bioportal.lirmm.fr/annotator ), a publicly accessible ontology-based annotation web service to process biomedical text data in French. The service, developed during the Semantic Indexing of French Biomedical Data Resources (2013-2019) project is included in the SIFR BioPortal, an open platform to host French biomedical ontologies and terminologies based on the technology developed by the US National Center for Biomedical Ontology. The portal facilitates use and fostering of ontologies by offering a set of services -search, mappings, metadata, versioning, visualization, recommendation- including for annotation purposes. We introduce the adaptations and improvements made in applying the technology to French as well as a number of language independent additional features -implemented by means of a proxy architecture- in particular annotation scoring and clinical context detection. We evaluate the performance of the SIFR Annotator on different biomedical data, using available French corpora -Quaero (titles from French MEDLINE abstracts and EMEA drug labels) and CépiDC (ICD-10 coding of death certificates)- and discuss our results with respect to the CLEF eHealth information extraction tasks.

Conclusions: We show the web service performs comparably to other knowledge-based annotation approaches in recognizing entities in biomedical text and reach state-of-the-art levels in clinical context detection (negation, experiencer, temporality). Additionally, the SIFR Annotator is the first openly web accessible tool to annotate and contextualize French biomedical text with ontology concepts leveraging a dictionary currently made of 28 terminologies and ontologies and 333 K concepts. The code is openly available, and we also provide a Docker packaging for easy local deployment to process sensitive (e.g., clinical) data in-house ( https://github.com/sifrproject ).

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

CISMeF/SIBM (see related work section) were an early partner of the SIFR project.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The SIFR Annotator user interface. The upper screen capture illustrates the main form of the annotator, where one inputs text and selects the annotation parameters. The lower screen capture shows the table with the resulting annotations
Fig. 2
Fig. 2
NCBO and SIFR Annotator(s) core components
Fig. 3
Fig. 3
Proxy service architecture implementing the SIFR Annotator extended workflow. During preprocessing, parameters are handled and text can be lemmatized, before both are sent to the core annotator components. During annotation postprocessing, scoring and context detection are performed. Subsequently, the output is serialized to the requested format
Fig. 4
Fig. 4
Illustration of the PER annotation task and the score computation. Entities in PER are identified by their character offsets (begin and end from the start of the text) and by their UMLS Semantic Group
Fig. 5
Fig. 5
Illustration of the NER annotation task and the score computation. In NER, we annotate entities found in PER with one or more CUIs

References

    1. Butte AJ, Chen R. Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics. In: AMIA Annual Symposium Proceedings. Washington D.C: AMIA; 2006. p. 106–110. - PMC - PubMed
    1. Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, et al. Advancing translational research with the semantic web. BMC Bioinformatics. 2007;8:S2. doi: 10.1186/1471-2105-8-S3-S2. - DOI - PMC - PubMed
    1. Drolet BC, Lorenzi NM. Translational research: understanding the continuum from bench to bedside. Transl Res. 2011;157:1–5. doi: 10.1016/j.trsl.2010.10.002. - DOI - PubMed
    1. Blake JA. Bio-ontologies—fast and furious. Nat Biotechnol. 2004;22:773–774. doi: 10.1038/nbt0604-773. - DOI - PubMed
    1. Rubin DL, Shah NH, Noy NF. Biomedical ontologies: a functional perspective. Brief Bioinform. 2008;9:75–90. doi: 10.1093/bib/bbm059. - DOI - PubMed