Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Randomized Controlled Trial
. 2025 Jul 2;16(1):5374.
doi: 10.1038/s41467-025-60987-9.

Deep molecular profiling of synovial biopsies in the STRAP trial identifies signatures predictive of treatment response to biologic therapies in rheumatoid arthritis

Collaborators, Affiliations
Randomized Controlled Trial

Deep molecular profiling of synovial biopsies in the STRAP trial identifies signatures predictive of treatment response to biologic therapies in rheumatoid arthritis

Myles J Lewis et al. Nat Commun. .

Abstract

Approximately 40% of patients with rheumatoid arthritis do not respond to individual biologic therapies, while biomarkers predictive of treatment response are lacking. Here we analyse RNA-sequencing (RNA-Seq) of pre-treatment synovial tissue from the biopsy-based, precision-medicine STRAP trial (n = 208), to identify gene response signatures to the randomised therapies: etanercept (TNF-inhibitor), tocilizumab (interleukin-6 receptor inhibitor) and rituximab (anti-CD20 B-cell depleting antibody). Machine learning models applied to RNA-Seq predict clinical response to etanercept, tocilizumab and rituximab at the 16-week primary endpoint with area under receiver operating characteristic curve (AUC) values of 0.763, 0.748 and 0.754 respectively (n = 67-72) as determined by repeated nested cross-validation. Prediction models for tocilizumab and rituximab are validated in an independent cohort (R4RA): AUC 0.713 and 0.786 respectively (n = 65-68). Predictive signatures are converted for use with a custom synovium-specific 524-gene nCounter panel and retested on synovial biopsy RNA from STRAP patients, demonstrating accurate prediction of treatment response (AUC 0.82-0.87). The converted models are combined into a unified clinical decision algorithm that has the potential to transform future clinical practice by assisting the selection of biologic therapies.

PubMed Disclaimer

Conflict of interest statement

Competing interests: C.P., M.J.L. and C.C. are inventors on a patent application (no. GB 2410224.6), submitted by Queen Mary University of London, that covers methods used to select treatments in RA. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Synovial signature of response to biologics at baseline.
a, c, e Volcano plots of differentially expressed genes from DESeq2 analysis of RNA-sequencing of baseline synovial biopsies of rheumatoid arthritis individuals receiving treatment with a etanercept (n = 67), c tocilizumab (n = 69) or e rituximab (n = 72) comparing ACR20 responders vs non-responders at the 16-week primary endpoint. DESeq2 statistical analysis uses generalised linear modelling of count data using the negative binomial distribution. The model included a single covariate based on principal component analysis applied to 17 muscle tissue-specific genes (see Methods). P values were calculated by a two-sided Wald test with FDR correction (Storey’s q value) for multiple testing. Genes in blue are significant at FDR <0.05, genes in grey are non-significant. b, d, f Modular analysis applying QuSAGE statistical testing with blood-derived gene modules (Li et al., 2014) and synovium-derived WGCNA modules for 16-week ACR20 responders versus non-responders to etanercept (b), tocilizumab (d) and rituximab (f). Log2-fold change of responders (positive values) and non-responders (negative values) are plotted with dots colour coded for unadjusted p value.
Fig. 2
Fig. 2. Analysis of common and differential molecular signatures of responsiveness/resistance to etanercept, tocilizumab and rituximab.
a Volcano plot showing differentially expressed genes between all ACR20 responders (n = 133) and non-responder (n = 75) patients to tocilizumab, etanercept and rituximab combined following 16 weeks of treatment. Statistical analysis by negative binomial distribution, generalised linear regression of count data via DESeq2. P values were calculated by a two-sided Wald test with FDR correction (Storey’s q value) for multiple testing. b Shared enriched pathways across tocilizumab/etanercept/rituximab responders and non-responders. Grey dashed lines indicate p-value cutoff for FDR <0.05. c Three-way polar plot comparing genes associated with resistance to each individual drug. Red genes (n = 9) are significantly upregulated only in non-responder patients treated with etanercept. Genes in green (n = 22) are significantly upregulated in rituximab non-responders only. Blue genes (n = 30) are significantly upregulated in the tocilizumab non-responder group. d Three-way polar plot of significantly upregulated genes in responder patients to etanercept (24 genes, red dots), rituximab (eight genes, green dots) and tocilizumab (59 genes, blue dots). e Forest plots of individual genes showing different log2FC (responders/non-responders) in each drug. ***p < 0.001, **p < 0.01, *p < 0.05, ‘.’ p < 0.10 using FDR-adjusted two-sided Wald test p values. Precise p values are available in the supplementary material. Error bars show 95% confidence intervals.
Fig. 3
Fig. 3. Single-cell subset patterns of responder and non-responder patients to etanercept, tocilizumab and rituximab.
a Heatmap showing estimated immune cell subset profiles of all individuals at baseline, calculated by gene module score using Seurat. Individuals (columns) were clustered using the Euclidean distance metric and complete linkage clustering method. Upper tracks show ESR, CRP, cell type (B cell rich/poor), pathotype, ACR20 and ACR50 response, randomised medication (treatment) and histological scores for CD3, CD20, CD138, CD68L (lining) and CD68SL (sublining). b Forest plot showing mean fold-changes of single-cell subsets that are differentially present in any responders compared to any non-responders. Error bars show 95% confidence intervals. Statistical analysis (two-sided) by linear model using limma. Significant fold-changes are indicated with asterisks (*p < 0.05, **p < 0.01, ***p < 0.001). Precise p values are available in the supplementary material. c Forest plot showing fold-changes of single-cell subsets that are differentially present in responders compared to non-responders separately in each medication. d Box plots showing module scores of SC-B2 (IGHG3+CD27+ memory B-cell), SC-T3 (PD-1+ Tph/Tfh) T-cell subsets for etanercept, rituximab and any treatment (either etanercept, rituximab or tocilizumab) groups. Box plots show median, upper and lower quartiles, with whiskers denoting maximal and minimal data within 1.5 × interquartile range.
Fig. 4
Fig. 4. Unsupervised clustering reveals molecular groups of patients that are reflected in an independent cohort.
a Unsupervised k-means clustering on the 3411 most expressed genes from all baseline samples (n = 208) reveals three distinct subgroups of patients. Upper tracks show histological scores for CD3, CD20, CD68L, CD68SL, CD138, cell type (B cell rich/poor), pathotype and DAS28 CRP response. b Unsupervised k-means clustering on the 2259 most expressed genes from all baseline samples (n = 133) of the R4RA cohort reveals subgroups of patients that share common molecular signatures with the clusters found in the STRAP cohort. Upper tracks show histological scores for CD3, CD20, CD68L, CD68SL, CD138, cell type (B cell rich/poor), pathotype and DAS28 CRP response. c Pathway analysis of the three gene clusters identified in the STRAP cohort (cluster 1 = 983 genes, cluster 2 = 1420 genes, cluster 3 = 1008 genes). d Venn diagrams showing numbers of distinct and shared genes in the three clusters independently obtained in STRAP and R4RA. e The PCA plots from baseline samples of the STRAP cohorts colour coded by pathotype (top) and unsupervised molecular clusters identified by k-means clustering (bottom).
Fig. 5
Fig. 5. Machine learning predictive models fitted using ten-by-ten-fold nested cross-validation for response to etanercept, tocilizumab and rituximab.
a Schema showing a machine learning pipeline. b Box plots of model performance for each of the three trial drugs. Multiple types of machine learning (ML) models were fitted to baseline synovial RNA-Seq gene expression data to predict response to each trial drug at the 16-week primary endpoint, with response defined as DAS28-ESR <3.2. Model types: gradient boosted machine (gbm), elastic net regression (glmnet), mixed discriminant analysis (mda), random forest (rf), support vector machine (svm) with polynomial (svmPoly) or radial (svmRadial) kernel, extreme gradient boosting (xgboost) with tree booster (xgbTree) or linear booster (xgbLinear). Unbiased model performance was determined by 10 × 10-fold nested cross-validation (CV) with 25 repeats (each point shows one repeat), with the area under the receiver operating characteristic (ROC) curve as performance metric for etanercept and tocilizumab. The Coefficient of determination R2 was used as a performance metric for rituximab models (see Methods), which were fitted to an ordinal (four-level) response outcome, as this led to improved final binary response prediction. Box plots show median, upper and lower quartiles, with whiskers denoting maximal and minimal data within 1.5 × interquartile range (IQR). c ROC curves for final best models for each drug, showing nested CV ROC and ROC calculated from inner CV folds. d Variable importance plots showing stability of variables selected by the final ML model for each drug across nested CV. Error bars show the standard error of mean variable importance, size of points shows frequency with which each gene/predictor was selected by models during nested CV. Colour of points shows directionality of association with response: red for genes/predictors upregulated in non-response, blue for genes/predictors upregulated in response. e Validation of STRAP-trained tocilizumab and rituximab machine learning models in R4RA. Models for tocilizumab and rituximab shown in c, d were applied to synovial RNA-Seq and data from patients randomised to treatment with tocilizumab (n = 65) or rituximab (n = 68) in the R4RA trial. Predicted outcome was compared to the real outcome, with response defined as DAS28-ESR <3.2 at the 16-week primary endpoint of the trial. Predictive model performance was assessed by ROC AUC.
Fig. 6
Fig. 6. Conversion and validation of machine learning models using the nCounter assay.
a Flow diagram outlining the process of converting the RNA-Seq models to a workable nanostring nCounter-based assay. Spare baseline synovial biopsy samples from STRAP were subjected to nCounter assay using a custom synovial 524-gene panel. nCounter data was rescaled to RNA-Seq scale (“pseudo-RNA-Seq”) using linear models for each gene. Rescaled nCounter data was passed to machine learning models from Fig. 4c, and the performance of each model was assessed. b Confusion matrices showing predicted versus actual response, accuracy and balanced accuracy of nCounter assay applied to baseline synovial biopsies for prediction of response defined as DAS28-ESR <3.2 after 16 weeks of treatment. c Receiver operating characteristic (ROC) curve plots and area under the curve (AUC) measurements for prediction of response to etanercept, tocilizumab and rituximab from nCounter assay applied to baseline synovial biopsies from STRAP. d Proposed algorithm for allocation of a new patient to one of three possible biologic therapy categories (TNF-inhibitor, IL6-inhibitor or B-cell depleting agent) based on whichever model gives the highest predicted probability of response. Individuals with low predicted probability (all p < 0.5) of response to all three classes of biologics are categorised as “biomarker negative” and can be offered an alternative class of therapeutic agent. a, d created in BioRender with modifications (https://BioRender.com/r4uqilh).

References

    1. Smolen, J. S. et al. Rheumatoid arthritis. Nat. Rev. Dis. Prim.4, 18001 (2018). - PubMed
    1. Buch, M. H., Eyre, S. & McGonagle, D. Persistent inflammatory and non-inflammatory mechanisms in refractory rheumatoid arthritis. Nat. Rev. Rheumatol.17, 17–33 (2021). - PubMed
    1. Wijbrandts, C. A. & Tak, P. P. Prediction of response to targeted treatment in rheumatoid arthritis. Mayo Clin. Proc.92, 1129–1143 (2017). - PubMed
    1. Wang, S. S., Lewis, M. J. & Pitzalis, C. DNA methylation signatures of response to conventional synthetic and biologic disease-modifying antirheumatic drugs (DMARDs) in rheumatoid arthritis. Biomedicines11, 1987 (2023). - PMC - PubMed
    1. Dennis, G. Jr. et al. Synovial phenotypes in rheumatoid arthritis correlate with response to biologic therapeutics. Arthritis Res. Ther.16, R90 (2014). - PMC - PubMed

Publication types

MeSH terms