eGIFT: mining gene information from the literature
- PMID: 20696046
- PMCID: PMC2929241
- DOI: 10.1186/1471-2105-11-418
eGIFT: mining gene information from the literature
Abstract
Background: With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult for life scientists and gene curators to rapidly get an overall picture about a specific gene from documents that mention its names and synonyms.
Results: In this paper, we present eGIFT (http://biotm.cis.udel.edu/eGIFT), a web-based tool that associates informative terms, called iTerms, and sentences containing them, with genes. To associate iTerms with a gene, eGIFT ranks iTerms about the gene, based on a score which compares the frequency of occurrence of a term in the gene's literature to its frequency of occurrence in documents about genes in general. To retrieve a gene's documents (Medline abstracts), eGIFT considers all gene names, aliases, and synonyms. Since many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene. Another additional filtering process is applied to retain those abstracts that focus on the gene rather than mention it in passing. eGIFT's information for a gene is pre-computed and users of eGIFT can search for genes by using a name or an EntrezGene identifier. iTerms are grouped into different categories to facilitate a quick inspection. eGIFT also links an iTerm to sentences mentioning the term to allow users to see the relation between the iTerm and the gene. We evaluated the precision and recall of eGIFT's iTerms for 40 genes; between 88% and 94% of the iTerms were marked as salient by our evaluators, and 94% of the UniProtKB keywords for these genes were also identified by eGIFT as iTerms.
Conclusions: Our evaluations suggest that iTerms capture highly-relevant aspects of genes. Furthermore, by showing sentences containing these terms, eGIFT can provide a quick description of a specific gene. eGIFT helps not only life scientists survey results of high-throughput experiments, but also annotators to find articles describing gene aspects and functions.
Figures



Similar articles
-
WebGIVI: a web-based gene enrichment analysis and visualization tool.BMC Bioinformatics. 2017 May 4;18(1):237. doi: 10.1186/s12859-017-1664-2. BMC Bioinformatics. 2017. PMID: 28472919 Free PMC article.
-
Developing a biocuration workflow for AgBase, a non-model organism database.Database (Oxford). 2012 Nov 17;2012:bas038. doi: 10.1093/database/bas038. Print 2012. Database (Oxford). 2012. PMID: 23160411 Free PMC article.
-
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y. J Biomed Semantics. 2016. PMID: 27216254 Free PMC article.
-
PubMed and beyond: a survey of web tools for searching biomedical literature.Database (Oxford). 2011 Jan 18;2011:baq036. doi: 10.1093/database/baq036. Print 2011. Database (Oxford). 2011. PMID: 21245076 Free PMC article. Review.
-
An Improved Forensic Science Information Search.Forensic Sci Rev. 2015 Jan;27(1):41-52. Forensic Sci Rev. 2015. PMID: 26227137 Review.
Cited by
-
Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature.Methods Mol Biol. 2017;1558:213-232. doi: 10.1007/978-1-4939-6783-4_10. Methods Mol Biol. 2017. PMID: 28150240 Free PMC article.
-
Transcriptome analysis of post-hatch breast muscle in legacy and modern broiler chickens reveals enrichment of several regulators of myogenic growth.PLoS One. 2015 Mar 30;10(3):e0122525. doi: 10.1371/journal.pone.0122525. eCollection 2015. PLoS One. 2015. PMID: 25821972 Free PMC article.
-
Identification of conclusive association entities in biomedical articles.J Biomed Semantics. 2019 Jan 7;10(1):1. doi: 10.1186/s13326-018-0194-9. J Biomed Semantics. 2019. PMID: 30616688 Free PMC article.
-
WebGIVI: a web-based gene enrichment analysis and visualization tool.BMC Bioinformatics. 2017 May 4;18(1):237. doi: 10.1186/s12859-017-1664-2. BMC Bioinformatics. 2017. PMID: 28472919 Free PMC article.
-
AgBase: supporting functional modeling in agricultural organisms.Nucleic Acids Res. 2011 Jan;39(Database issue):D497-506. doi: 10.1093/nar/gkq1115. Epub 2010 Nov 12. Nucleic Acids Res. 2011. PMID: 21075795 Free PMC article.
References
-
- McEntyre J, Lipman D. PubMed: bridging the information gap. Canadian Medical Association Journal. 2001;164(9):1317–1319. http://www.ncbi.nlm.nih.gov/sites/entrez - PMC - PubMed
-
- BioMed Central. http://www.biomedcentral.com/
-
- Liu Y, Brandon M, Navathe S, Dingledine R, Ciliax BJ. Text mining functional keywords associated with genes. MedInfo. 2004;11:292–296. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Research Materials