Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov:135:104213.
doi: 10.1016/j.jbi.2022.104213. Epub 2022 Sep 30.

Clustering-based fusion for medical information retrieval

Affiliations
Free article

Clustering-based fusion for medical information retrieval

Qiuyu Xu et al. J Biomed Inform. 2022 Nov.
Free article

Abstract

Medicine is a fast-moving field, and the number of medical publications has increased rapidly over recent years. How to find relevant information from this vast pool of research effectively and efficiently has therefore become highly challenges. Previous studies have demonstrated that data fusion can improve search performance if properly utilized. However, in most cases effectiveness is the only concern and efficiency is not considered. A fusion-based system is by nature more complicated and expensive computationally than other retrieval models such as BM25, because many component retrieval systems and an extra layer of fusion are required. The number of component retrieval systems involved is an important indicator of complexity of the fusion-based system. We aim to select the optimal k-subset of component retrieval systems for any given number k, to optimize both fusion performance and reduce the cost of data fusion. A clustering-based approach is proposed. First all the candidates are divided into clusters by the Chameleon clustering algorithm, then representatives from every cluster are chosen by Sequential Forward Selection for fusion. Evaluated with two datasets from TREC, the proposed method performs more effectively than the other baseline methods including the state-of-the-art subset selection method significantly. When either of the two typical fusion methods is used, an improvement rate of over 10% is observed for both measures Mean Average Precision and Recall-level Precision, and an improvement rate of over 5% is observed for both measures Precision at 10 document level and Mean Reciprocal Rank.

Keywords: Clustering; Data fusion; Efficiency and effectiveness; Medical information retrieval; Subset selection.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

LinkOut - more resources