Identification of individuals at high risk of developing rheumatoid arthritis: a balanced random forest model in a cohort of 1544 first-degree relatives
- PMID: 41314668
- PMCID: PMC12666099
- DOI: 10.1136/rmdopen-2025-005773
Identification of individuals at high risk of developing rheumatoid arthritis: a balanced random forest model in a cohort of 1544 first-degree relatives
Abstract
Objectives: To identify in a genetically susceptible population individuals at higher risk of developing rheumatoid arthritis (RA) using a classification approach combining known epidemiological risk factors, serological biomarkers, genetics, clinical signs and symptoms.
Methods: We used data from the prospective SCREEN-RA (Evaluation of a SCREENing strategy for Rheumatoid Arthritis) cohort of 1540 first-degree relatives of RA patients (RA-FDRs). The primary outcome was the development of RA. Additionally, we used seropositive inflammatory arthritis (IA) as a secondary outcome for exploratory analyses. Balanced random forest (BRF) models were fit and evaluated through fivefold cross-validation to avoid overfitting. We chose a classification threshold that targeted high sensitivity.
Results: After a mean follow-up of 7.1 years, 27 participants developed RA and 126 developed seropositive IA. The BRF demonstrated moderate predictive performance, characterised by high sensitivity (≥0.85) but modest specificity. Rheumatoid factors (RFs) had the highest importance in RA prediction, followed by symptoms of 'clinically suspected arthralgia' (CSA) scale. Age, gender and anti-RA33 autoantibodies were the main variables for the prediction of seropositive IA.
Conclusions: Overall, the results demonstrate that predicting RA by combining genetics, serological biomarkers, epidemiological risk factors and clinical signs is promising, although model generalisation remains challenging. The low prevalence of RA in the cohort complicates the development of highly accurate prediction models. Future efforts should focus on including external validation and potentially incorporating additional biomarkers to enhance the sensitivity and overall performance of the predictive tests.
Keywords: Arthritis, Rheumatoid; Biomarkers; Epidemiology; Machine Learning; Sensitivity and Specificity.
© Author(s) (or their employer(s)) 2025. Re-use permitted under CC BY. Published by BMJ Group.
Conflict of interest statement
Competing interests: RA, none declared. CL, none declared. BG has received speaker fees from Lilly, outside the submitted work. MG, IG and SS are employees of Thermo Fisher Scientific—Phadia GmbH. OS, none declared. ZS, none declared. RG, none declared. DS, none declared. JD, none with the submitted work. BM, none declared, DD has received speaker’s fees from Eli Lilly, Novartis, UCB, GSK, Menarini, Viatris, for attending meetings from Abbvie, UCB, Janssen and for participation on an advisory board from Novartis, all outside the submitted work. LB, none declared. IvM has received support for attending meetings and/or travel from Novartis, Abbvie, Pfizer and UCB, outside the submitted work. DK has received consulting/speaker’s fees from Abbvie, Pfizer, Eli Lilly, Sanofi, UCB and Novartis, outside the submitted work. ARR has received consulting fees from Abbvie, Gilead, Lilly and BMS, speaker’s fees from Abbvie, Pfizer, Sanofi, UCB, BMS, Lilly, Gilead and Roche, and payment for expert testimony__ from Abbvie and Gilead, all outside the submitted work. AC, none declared. RM, none declared. DSC, none declared. AF has received grants or contracts (Eli Lilly, Pfizer, AbbVie, Gilead and BMS), consulting fees (AstraZeneca, AbbVie, Pfizer and Gilead) and honorary payments (BMIS, AbbVie, Eli Lilly, Pfizer and MSD) and participated in advisory boards (AstraZeneca, Gilead, Novartis, AbbVie, Eli Lilly, Pfizer, J&J, Mylan and UCB).
Figures
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical