Ranking documents with a thesaurus
- PMID: 10303917
- DOI: 10.1002/(SICI)1097-4571(198909)40:5<304::AID-ASI2>3.0.CO;2-6
Ranking documents with a thesaurus
Abstract
This article reports on exploratory experiments in evaluating and improving a thesaurus through studying its effect on retrieval. A formula called DISTANCE was developed to measure the conceptual distance between queries and documents encoded as sets of thesaurus terms. DISTANCE references MeSH (Medical Subject Headings) and assesses the degree of match between a MeSH-encoded query and document. The performance of DISTANCE on MeSH is compared to the performance of people in the assessment of conceptual distance between queries and documents, and is found to simulate with surprising accuracy the human performance. The power of the computer simulation stems both from the tendency of people to rely heavily on broader-than (BT) relations in making decisions about conceptual distance and from the thousands of accurate BT relations in MeSH. One source for discrepancy between the algorithms' measurement of closeness between query and document and people's measurement of closeness between query and document is occasional inconsistency in the BT relations. Our experiments with adding non-BT relations to MeSH showed how these non-BT non-BT relations to MeSH showed how these non-BT relations could improve document ranking, if DISTANCE were also appropriately revised to treat these relations differently from BT relations.
Similar articles
-
Consistency and accuracy of the Medical Subject Headings thesaurus for electronic indexing and retrieval of chronobiologic references.Chronobiol Int. 2007;24(6):1213-29. doi: 10.1080/07420520701791570. Chronobiol Int. 2007. PMID: 18075808
-
Ranking the whole MEDLINE database according to a large training set using text indexing.BMC Bioinformatics. 2005 Mar 24;6:75. doi: 10.1186/1471-2105-6-75. BMC Bioinformatics. 2005. PMID: 15790421 Free PMC article.
-
HONselect: multilingual assistant search engine operated by a concept-based interface system to decentralized heterogeneous sources.Stud Health Technol Inform. 2001;84(Pt 1):309-13. Stud Health Technol Inform. 2001. PMID: 11604753
-
Development of a medical specialty recurring bibliography--Index of Rheumatology.Bull Med Libr Assoc. 1967 Jan;55(1):70-4. Bull Med Libr Assoc. 1967. PMID: 5334177 Free PMC article.
-
A vocabulary for medical informatics.Comput Biomed Res. 1987 Jun;20(3):244-63. doi: 10.1016/0010-4809(87)90057-7. Comput Biomed Res. 1987. PMID: 2886294
Cited by
-
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195. Bioinformatics. 2007. PMID: 17646340 Free PMC article.
-
Journal notes.Bull Med Libr Assoc. 1990 Apr;78(2):209-11. Bull Med Libr Assoc. 1990. PMID: 16017959 Free PMC article. No abstract available.
-
A unified architecture for biomedical search engines based on semantic web technologies.J Med Syst. 2011 Apr;35(2):237-49. doi: 10.1007/s10916-009-9360-z. Epub 2009 Aug 25. J Med Syst. 2011. PMID: 20703566
-
A performance and failure analysis of SAPHIRE with a MEDLINE test collection.J Am Med Inform Assoc. 1994 Jan-Feb;1(1):51-60. doi: 10.1136/jamia.1994.95236136. J Am Med Inform Assoc. 1994. PMID: 7719787 Free PMC article.
-
A transversal approach to predict gene product networks from ontology-based similarity.BMC Bioinformatics. 2007 Jul 2;8:235. doi: 10.1186/1471-2105-8-235. BMC Bioinformatics. 2007. PMID: 17605807 Free PMC article.