LINNAEUS: a species name identification system for biomedical literature
- PMID: 20149233
- PMCID: PMC2836304
- DOI: 10.1186/1471-2105-11-85
LINNAEUS: a species name identification system for biomedical literature
Abstract
Background: The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.
Results: In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers.
Conclusions: LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at http://linnaeus.sourceforge.net/.
Figures


Similar articles
-
OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.Bioinformatics. 2011 Oct 1;27(19):2721-9. doi: 10.1093/bioinformatics/btr452. Epub 2011 Aug 9. Bioinformatics. 2011. PMID: 21828087
-
NCBI disease corpus: a resource for disease name recognition and concept normalization.J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3. J Biomed Inform. 2014. PMID: 24393765 Free PMC article.
-
Concept annotation in the CRAFT corpus.BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161. BMC Bioinformatics. 2012. PMID: 22776079 Free PMC article.
-
Cell line name recognition in support of the identification of synthetic lethality in cancer from text.Bioinformatics. 2016 Jan 15;32(2):276-82. doi: 10.1093/bioinformatics/btv570. Epub 2015 Oct 1. Bioinformatics. 2016. PMID: 26428294 Free PMC article.
-
Empirical distributional semantics: methods and biomedical applications.J Biomed Inform. 2009 Apr;42(2):390-405. doi: 10.1016/j.jbi.2009.02.002. Epub 2009 Feb 14. J Biomed Inform. 2009. PMID: 19232399 Free PMC article. Review.
Cited by
-
Solr-Plant: efficient extraction of plant names from text.BMC Bioinformatics. 2019 May 22;20(1):263. doi: 10.1186/s12859-019-2874-6. BMC Bioinformatics. 2019. PMID: 31117932 Free PMC article.
-
An analysis on the entity annotations in biological corpora.F1000Res. 2014 Apr 25;3:96. doi: 10.12688/f1000research.3216.1. eCollection 2014. F1000Res. 2014. PMID: 25254099 Free PMC article. Review.
-
EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.Database (Oxford). 2016 Feb 20;2016:baw005. doi: 10.1093/database/baw005. Print 2016. Database (Oxford). 2016. PMID: 26896844 Free PMC article.
-
Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.Database (Oxford). 2024 Aug 28;2024:baae079. doi: 10.1093/database/baae079. Database (Oxford). 2024. PMID: 39197056 Free PMC article.
-
Molecular profiling of thyroid cancer subtypes using large-scale text mining.BMC Med Genomics. 2014;7 Suppl 3(Suppl 3):S3. doi: 10.1186/1755-8794-7-S3-S3. Epub 2014 Dec 8. BMC Med Genomics. 2014. PMID: 25521965 Free PMC article.
References
-
- PubMed Central. http://www.ncbi.nlm.nih.gov/pmc/
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources