Text mining and protein annotations: the construction and use of protein description sentences
- PMID: 17503385
Text mining and protein annotations: the construction and use of protein description sentences
Abstract
Existing biological knowledge stored as structured database records has been extracted manually by database curators analyzing the scientific literature. Most of this information was derived from sentences which describe biologically relevant aspects of genes and gene products. We introduce the Protein description sentence (Prodisen) corpus, a useful resource for the automatic identification and construction of text-based protein and gene description records using information extraction and text classification techniques. Basic guidelines and criteria relevant for the construction of a text corpus of functional descriptions of genes and proteins are proposed. The steps used for the corpus construction and its features are presented. Moreover, some of the potential applications of the Prodisen corpus for biomedical text mining purposes are explored and the obtained results are presented.
Similar articles
-
Evaluation of BioCreAtIvE assessment of task 2.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960828 Free PMC article.
-
PRIME: automatically extracted PRotein Interactions and Molecular Information databasE.In Silico Biol. 2005;5(1):9-20. In Silico Biol. 2005. PMID: 15972002
-
Automatic extraction of gene/protein biological functions from biomedical text.Bioinformatics. 2005 Apr 1;21(7):1227-36. doi: 10.1093/bioinformatics/bti084. Epub 2004 Oct 27. Bioinformatics. 2005. PMID: 15509601
-
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?Brief Bioinform. 2008 Nov;9(6):466-78. doi: 10.1093/bib/bbn043. Epub 2008 Dec 6. Brief Bioinform. 2008. PMID: 19060303 Review.
-
Status of text-mining techniques applied to biomedical text.Drug Discov Today. 2006 Apr;11(7-8):315-25. doi: 10.1016/j.drudis.2006.02.011. Drug Discov Today. 2006. PMID: 16580973 Review.
Cited by
-
Assessment of disease named entity recognition on a corpus of annotated sentences.BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3. BMC Bioinformatics. 2008. PMID: 18426548 Free PMC article.
-
Linking genes to literature: text mining, information extraction, and retrieval applications for biology.Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1. Genome Biol. 2008. PMID: 18834499 Free PMC article. Review.
-
Overview of the protein-protein interaction annotation extraction task of BioCreative II.Genome Biol. 2008;9 Suppl 2(Suppl 2):S4. doi: 10.1186/gb-2008-9-s2-s4. Epub 2008 Sep 1. Genome Biol. 2008. PMID: 18834495 Free PMC article.
-
Predicting protein functions by applying predicate logic to biomedical literature.BMC Bioinformatics. 2019 Feb 8;20(1):71. doi: 10.1186/s12859-019-2594-y. BMC Bioinformatics. 2019. PMID: 30736739 Free PMC article.
-
New challenges for text mining: mapping between text and manually curated pathways.BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-9-S3-S5. BMC Bioinformatics. 2008. PMID: 18426550 Free PMC article.