This is a preprint.
Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models
- PMID: 40735093
- PMCID: PMC12306818
Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models
Abstract
We investigated the feasibility of predicting Medical Subject Headings (MeSH) Publication Types (PTs) from MEDLINE citation metadata using pre-trained Transformer-based models BERT and DistilBERT. This study addresses limitations in the current automated indexing process, which relies on legacy NLP algorithms. We evaluated monolithic multi-label classifiers and binary classifier ensembles to enhance the retrieval of biomedical literature. Results demonstrate the potential of Transformer models to significantly improve PT tagging accuracy, paving the way for scalable, efficient biomedical indexing.
Keywords: MEDLINE; Machine Learning; MeSH Publication Types; Natural Language Processing; Pre-trained Foundation Models.
Figures
References
-
- NLM, “Publication Characteristics (Publication Types) with Scope Notes,” 22 December 2023. [Online]. Available: https://www.nlm.nih.gov/mesh/pubtypes.html. [Accessed 3 June 2024].
-
- NLM, “MEDLINE/PubMed Data Element (Field) Descriptions.,” [Online]. Available: https://www.nlm.nih.gov/bsd/mms/medlineelements.html. [Accessed 3 June 2024].
-
- Barnes J., Abbot N. C., F. H. E. and Ernst E., “Articles on Complementary Medicine in the Mainstream Medical Literature: An Investigation of MEDLINE, 1966 through 1996.,” Arch Intern Med, vol. 159, no. 15, pp. 1721–1725, 1999. - PubMed
Publication types
LinkOut - more resources
Full Text Sources