A quantitative model for linking two disparate sets of articles in MEDLINE
- PMID: 17463015
- DOI: 10.1093/bioinformatics/btm161
A quantitative model for linking two disparate sets of articles in MEDLINE
Abstract
Background: Identifying information that implicitly links two disparate sets of articles is a fundamental and intuitive data mining strategy that can help investigators address real scientific questions. The Arrowsmith two-node search finds title words and phrases (so-called B-terms) that are shared across two sets of articles within MEDLINE and displays them in a manner that facilitates human assessment. A serious stumbling-block has been the lack of a quantitative model for predicting which of the hundreds if not thousands of B-terms computed for a given search are most likely to be relevant to the investigator.
Methodology/principal findings: Using a public two-node search interface, field testers devised a set of two-node searches under real life conditions and a certain number of B-terms were marked relevant. These were employed as 'gold standards;' each B-term was characterized according to eight complementary features that were strongly correlated with relevance. A logistic regression model was developed that permits one to estimate the probability of relevance for each B-term, to rank B-terms according to their likely relevance, and to estimate the overall number of relevant B-terms inherent in a given two-node search.
Conclusions/significance: The model greatly simplifies and streamlines the process of carrying out a two-node search, and may be applicable to a number of other literature-based discovery applications, including the so-called one-node search and related gene-centric strategies that incorporate implicit links to predict how genes may be related to each other and to human diseases. This should encourage much wider exploration of text mining for implicit information among the general scientific community.
Availability: Two-node searches can be carried out freely at http://arrowsmith.psych.uic.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
Similar articles
-
Text similarity: an alternative way to search MEDLINE.Bioinformatics. 2006 Sep 15;22(18):2298-304. doi: 10.1093/bioinformatics/btl388. Epub 2006 Aug 22. Bioinformatics. 2006. PMID: 16926219
-
Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library.Int J Med Inform. 2006 Jun;75(6):488-95. doi: 10.1016/j.ijmedinf.2005.06.007. Epub 2005 Sep 13. Int J Med Inform. 2006. PMID: 16165395
-
Exploring supervised and unsupervised methods to detect topics in biomedical text.BMC Bioinformatics. 2006 Mar 16;7:140. doi: 10.1186/1471-2105-7-140. BMC Bioinformatics. 2006. PMID: 16539745 Free PMC article.
-
Status of text-mining techniques applied to biomedical text.Drug Discov Today. 2006 Apr;11(7-8):315-25. doi: 10.1016/j.drudis.2006.02.011. Drug Discov Today. 2006. PMID: 16580973 Review.
-
Extracting interactions between proteins from the literature.J Biomed Inform. 2008 Apr;41(2):393-407. doi: 10.1016/j.jbi.2007.11.008. Epub 2007 Dec 15. J Biomed Inform. 2008. PMID: 18207462 Review.
Cited by
-
Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization).BMC Bioinformatics. 2010 Oct 7;11 Suppl 6(Suppl 6):S14. doi: 10.1186/1471-2105-11-S6-S14. BMC Bioinformatics. 2010. PMID: 20946597 Free PMC article.
-
Anne O'Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results.J Biomed Discov Collab. 2008 Feb 15;3:2. doi: 10.1186/1747-5333-3-2. J Biomed Discov Collab. 2008. PMID: 18279519 Free PMC article.
-
Mammalian Argonaute-DNA binding?Biol Direct. 2014 Dec 4;10:27. doi: 10.1186/s13062-014-0027-4. Biol Direct. 2014. PMID: 25472905 Free PMC article.
-
Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery.J Data Inf Sci. 2017 Dec;2(4):43-64. doi: 10.1515/jdis-2017-0019. J Data Inf Sci. 2017. PMID: 29355246 Free PMC article.
-
Frontiers of biomedical text mining: current progress.Brief Bioinform. 2007 Sep;8(5):358-75. doi: 10.1093/bib/bbm045. Epub 2007 Oct 30. Brief Bioinform. 2007. PMID: 17977867 Free PMC article. Review.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources