Mapping two decades of research in rheumatology-specific journals: a topic modeling analysis with BERTopic
- PMID: 39734395
- PMCID: PMC11672599
- DOI: 10.1177/1759720X241308037
Mapping two decades of research in rheumatology-specific journals: a topic modeling analysis with BERTopic
Abstract
Background: Rheumatology has experienced notable changes in the last decades. New drugs, including biologic agents and Janus kinase (JAK) inhibitors, have blossomed. Concepts such as window of opportunity, arthralgia suspicious for progression, or difficult-to-treat rheumatoid arthritis (RA) have appeared; and new management approaches and strategies such as treat-to-target have become popular. Statistical learning methods, gene therapy, telemedicine, or precision medicine are other advancements that have gained relevance in the field. To better characterize the research landscape and advances in rheumatology, automatic and efficient approaches based on natural language processing (NLP) should be used.
Objectives: The objective of this study is to use topic modeling (TM) techniques to uncover key topics and trends in rheumatology research conducted in the last 23 years.
Design: Retrospective study.
Methods: This study analyzed 96,004 abstracts published between 2000 and December 31, 2023, drawn from 34 specialized rheumatology journals obtained from PubMed. BERTopic, a novel TM approach that considers semantic relationships among words and their context, was used to uncover topics. Up to 30 different models were trained. Based on the number of topics, outliers, and topic coherence score, two of them were finally selected, and the topics were manually labeled by two rheumatologists. Word clouds and hierarchical clustering visualizations were computed. Finally, hot and cold trends were identified using linear regression models.
Results: Abstracts were classified into 45 and 47 topics. The most frequent topics were RA, systemic lupus erythematosus, and osteoarthritis. Expected topics such as COVID-19 or JAK inhibitors were identified after conducting dynamic TM. Topics such as spinal surgery or bone fractures have gained relevance in recent years; however, antiphospholipid syndrome or septic arthritis have lost momentum.
Conclusion: Our study utilized advanced NLP techniques to analyze the rheumatology research landscape and identify key themes and emerging trends. The results highlight the dynamic and varied nature of rheumatology research, illustrating how interest in certain topics has shifted over time.
Keywords: BERTopic; PubMed; artificial intelligence; natural language processing; topic modeling; transformers; trend analysis.
© The Author(s), 2024.
Conflict of interest statement
The authors declare that there is no conflict of interest.
Figures
References
-
- Thelwall M, Sud P. Scopus 1900–2020: growth in articles, abstracts, countries, fields, and journals. Quant Sci Stud 2022; 3: 37–50.
-
- Bornmann L, Haunschild R, Mutz R. Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanit Soc Sci Commun 2021; 8: 1–15.
-
- Olsen NJ, Stein CM. New drugs for rheumatoid arthritis. N Engl J Med 2004; 350: 2167–2179. - PubMed
-
- Smolen JS. Insights into the treatment of rheumatoid arthritis: a paradigm in medicine. J Autoimmun 2020; 110: 102425. - PubMed
-
- Kerrigan SA, McInnes IB. Reflections on “older” drugs: learning new lessons in rheumatology. Nat Rev Rheumatol 2020; 16: 179–183. - PubMed
LinkOut - more resources
Full Text Sources
