Integrating image data into biomedical text categorization
- PMID: 16873506
- DOI: 10.1093/bioinformatics/btl235
Integrating image data into biomedical text categorization
Abstract
Categorization of biomedical articles is a central task for supporting various curation efforts. It can also form the basis for effective biomedical text mining. Automatic text classification in the biomedical domain is thus an active research area. Contests organized by the KDD Cup (2002) and the TREC Genomics track (since 2003) defined several annotation tasks that involved document classification, and provided training and test data sets. So far, these efforts focused on analyzing only the text content of documents. However, as was noted in the KDD'02 text mining contest-where figure-captions proved to be an invaluable feature for identifying documents of interest-images often provide curators with critical information. We examine the possibility of using information derived directly from image data, and of integrating it with text-based classification, for biomedical document categorization. We present a method for obtaining features from images and for using them-both alone and in combination with text-to perform the triage task introduced in the TREC Genomics track 2004. The task was to determine which documents are relevant to a given annotation task performed by the Mouse Genome Database curators. We show preliminary results, demonstrating that the method has a strong potential to enhance and complement traditional text-based categorization methods.
Similar articles
-
Annotating images by mining image search results.IEEE Trans Pattern Anal Mach Intell. 2008 Nov;30(11):1919-32. doi: 10.1109/TPAMI.2008.127. IEEE Trans Pattern Anal Mach Intell. 2008. PMID: 18787241
-
Document image retrieval through word shape coding.IEEE Trans Pattern Anal Mach Intell. 2008 Nov;30(11):1913-8. doi: 10.1109/TPAMI.2008.89. IEEE Trans Pattern Anal Mach Intell. 2008. PMID: 18787240
-
Modeling semantic aspects for cross-media image indexing.IEEE Trans Pattern Anal Mach Intell. 2007 Oct;29(10):1802-17. doi: 10.1109/TPAMI.2007.1097. IEEE Trans Pattern Anal Mach Intell. 2007. PMID: 17699924
-
A text-mining perspective on the requirements for electronically annotated abstracts.FEBS Lett. 2008 Apr 9;582(8):1178-81. doi: 10.1016/j.febslet.2008.02.072. Epub 2008 Mar 6. FEBS Lett. 2008. PMID: 18328824 Review.
-
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?Brief Bioinform. 2008 Nov;9(6):466-78. doi: 10.1093/bib/bbn043. Epub 2008 Dec 6. Brief Bioinform. 2008. PMID: 19060303 Review.
Cited by
-
Figure and caption extraction from biomedical documents.Bioinformatics. 2019 Nov 1;35(21):4381-4388. doi: 10.1093/bioinformatics/btz228. Bioinformatics. 2019. PMID: 30949681 Free PMC article.
-
Compound image segmentation of published biomedical figures.Bioinformatics. 2018 Apr 1;34(7):1192-1199. doi: 10.1093/bioinformatics/btx611. Bioinformatics. 2018. PMID: 29040394 Free PMC article.
-
Caption-based topical descriptors for microscopic images as published in academic papers.Health Info Libr J. 2010 Sep;27(3):235-43. doi: 10.1111/j.1471-1842.2010.00897.x. Health Info Libr J. 2010. PMID: 20712718 Free PMC article.
-
Identifying relevant data for a biological database: handcrafted rules versus machine learning.IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):851-7. doi: 10.1109/TCBB.2009.83. IEEE/ACM Trans Comput Biol Bioinform. 2011. PMID: 21393656 Free PMC article.
-
Structured Literature Image Finder: Parsing Text and Figures in Biomedical Literature.Web Semant. 2010 Jul 1;8(2-3):151-154. doi: 10.1016/j.websem.2010.04.002. Web Semant. 2010. PMID: 24991197 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources