A comparative evaluation of biomedical similar article recommendation
- PMID: 35661818
- DOI: 10.1016/j.jbi.2022.104106
A comparative evaluation of biomedical similar article recommendation
Abstract
Background: Biomedical sciences, with their focus on human health and disease, have attracted unprecedented attention in the 21st century. The proliferation of biomedical sciences has also led to a large number of scientific articles being produced, which makes it difficult for biomedical researchers to find relevant articles and hinders the dissemination of valuable discoveries. To bridge this gap, the research community has initiated the article recommendation task, with the aim of recommending articles to biomedical researchers automatically based on their research interests. Over the past two decades, many recommendation methods have been developed. However, an algorithm-level comparison and rigorous evaluation of the most important methods on a shared dataset is still lacking.
Method: In this study, we first investigate 15 methods for automated article recommendation in the biomedical domain. We then conduct an empirical evaluation of the 15 methods, including six term-based methods, two word embedding methods, three sentence embedding methods, two document embedding methods, and two BERT-based methods. These methods are evaluated in two scenarios: article-oriented recommenders and user-oriented recommenders, with two publicly available datasets: TREC 2005 Genomics and RELISH, respectively.
Results: Our experimental results show that the text representation models BERT and BioSenVec outperform many existing recommendation methods (e.g., BM25, PMRA, XPRC) and web-based recommendation systems (e.g., MScanner, MedlineRanker, BioReader) on both datasets regarding most of the evaluation metrics, and fine-tuning can improve the performance of the BERT-based methods.
Conclusions: Our comparison study is useful for researchers and practitioners in selecting the best modeling strategies for building article recommendation systems in the biomedical domain. The code and datasets are publicly available.
Keywords: BERT; Biomedical article recommendation; Methodological comparison; Model evaluation; Modeling strategy; Text representation.
Copyright © 2022. Published by Elsevier Inc.
Similar articles
-
A content-based literature recommendation system for datasets to improve data reusability - A case study on Gene Expression Omnibus (GEO) datasets.J Biomed Inform. 2020 Apr;104:103399. doi: 10.1016/j.jbi.2020.103399. Epub 2020 Mar 6. J Biomed Inform. 2020. PMID: 32151769
-
A comparison of word embeddings for the biomedical natural language processing.J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12. J Biomed Inform. 2018. PMID: 30217670 Free PMC article.
-
Hybrid Methods of Bibliographic Coupling and Text Similarity Measurement for Biomedical Paper Recommendation.Stud Health Technol Inform. 2022 Jun 6;290:287-291. doi: 10.3233/SHTI220080. Stud Health Technol Inform. 2022. PMID: 35673019
-
Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study.J Healthc Inform Res. 2024 Jan 3;8(1):158-179. doi: 10.1007/s41666-023-00157-y. eCollection 2024 Mar. J Healthc Inform Res. 2024. PMID: 38273979 Free PMC article.
-
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6. BMC Bioinformatics. 2015. PMID: 25925131 Free PMC article.
Cited by
-
A hybrid algorithm for clinical decision support in precision medicine based on machine learning.BMC Bioinformatics. 2023 Jan 3;24(1):3. doi: 10.1186/s12859-022-05116-9. BMC Bioinformatics. 2023. PMID: 36597033 Free PMC article.
-
Study on Muscle Fatigue Classification for Manual Lifting by Fusing sEMG and MMG Signals.Sensors (Basel). 2025 Aug 13;25(16):5023. doi: 10.3390/s25165023. Sensors (Basel). 2025. PMID: 40871887 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous