Concept-based annotation of enzyme classes
- PMID: 15661799
- DOI: 10.1093/bioinformatics/bti284
Concept-based annotation of enzyme classes
Abstract
Motivation: Given the explosive growth of biomedical data as well as the literature describing results and findings, it is getting increasingly difficult to keep up to date with new information. Keeping databases synchronized with current knowledge is a time-consuming and expensive task-one which can be alleviated by automatically gathering findings from the literature using linguistic approaches. We describe a method to automatically annotate enzyme classes with disease-related information extracted from the biomedical literature for inclusion in such a database.
Results: Enzyme names for the 3901 enzyme classes in the BRENDA database, a repository for quantitative and qualitative enzyme information, were identified in more than 100,000 abstracts retrieved from the PubMed literature database. Phrases in the abstracts were assigned to concepts from the Unified Medical Language System (UMLS) utilizing the MetaMap program, allowing for the identification of disease-related concepts by their semantic fields in the UMLS ontology. Assignments between enzyme classes and diseases were created based on their co-occurrence within a single sentence. False positives could be removed by a variety of filters including minimum number of co-occurrences, removal of sentences containing a negation and the classification of sentences based on their semantic fields by a Support Vector Machine. Verification of the assignments with a manually annotated set of 1500 sentences yielded favorable results of 92% precision at 50% recall, sufficient for inclusion in a high-quality database.
Availability: Source code is available from the author upon request.
Supplementary information: ftp.uni-koeln.de/institute/biochemie/pub/brenda/info/diseaseSupp.pdf.
Similar articles
-
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25. Bioinformatics. 2005. PMID: 15564295
-
Automatic extension of Gene Ontology with flexible identification of candidate terms.Bioinformatics. 2006 Mar 15;22(6):665-70. doi: 10.1093/bioinformatics/btl010. Epub 2006 Jan 21. Bioinformatics. 2006. PMID: 16428805
-
Recognizing names in biomedical texts: a machine learning approach.Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10. Bioinformatics. 2004. PMID: 14871877
-
Novel biocatalysis by database mining.Curr Opin Biotechnol. 2004 Aug;15(4):280-4. doi: 10.1016/j.copbio.2004.05.003. Curr Opin Biotechnol. 2004. PMID: 15296925 Review.
-
Getting started in text mining.PLoS Comput Biol. 2008 Jan;4(1):e20. doi: 10.1371/journal.pcbi.0040020. PLoS Comput Biol. 2008. PMID: 18225946 Free PMC article. Review. No abstract available.
Cited by
-
Capturing patient information at nursing shift changes: methodological evaluation of speech recognition and information extraction.J Am Med Inform Assoc. 2015 Apr;22(e1):e48-66. doi: 10.1136/amiajnl-2014-002868. Epub 2014 Oct 21. J Am Med Inform Assoc. 2015. PMID: 25336589 Free PMC article.
-
Semantic reclassification of the UMLS concepts.Bioinformatics. 2008 Sep 1;24(17):1971-3. doi: 10.1093/bioinformatics/btn343. Epub 2008 Jul 13. Bioinformatics. 2008. PMID: 18625612 Free PMC article.
-
The Autoimmune Disease Database: a dynamically compiled literature-derived database.BMC Bioinformatics. 2006 Jun 27;7:325. doi: 10.1186/1471-2105-7-325. BMC Bioinformatics. 2006. PMID: 16803617 Free PMC article.
-
Functional group and substructure searching as a tool in metabolomics.PLoS One. 2008 Feb 6;3(2):e1537. doi: 10.1371/journal.pone.0001537. PLoS One. 2008. PMID: 18253485 Free PMC article.
-
Development of a classification scheme for disease-related enzyme information.BMC Bioinformatics. 2011 Aug 9;12:329. doi: 10.1186/1471-2105-12-329. BMC Bioinformatics. 2011. PMID: 21827651 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical