Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 24:16:1491041.
doi: 10.3389/fimmu.2025.1491041. eCollection 2025.

Comprehensive and advanced T cell cluster analysis for discriminating seropositive and seronegative rheumatoid arthritis

Affiliations

Comprehensive and advanced T cell cluster analysis for discriminating seropositive and seronegative rheumatoid arthritis

Shinji Maeda et al. Front Immunol. .

Abstract

Objective: Rheumatoid arthritis (RA) is classified into seropositive (SP-RA) and seronegative (SN-RA) types, reflecting distinct immunological profiles. This study aimed to identify the T cell phenotypes associated with each type, thereby enhancing our understanding of their unique pathophysiological mechanisms.

Methods: We analyzed peripheral blood T cells from 50 participants, including 16 patients with untreated SP-RA, 17 patients with SN-RA, and 17 healthy controls, utilizing 25 T cell markers. For initial analysis, a dataset was established through manual T cell subset gating analysis. For advanced analysis, two distinct datasets derived from a self-organizing map algorithm, FlowSOM, were used: one encompassing all CD3+ T cells and another focusing on activated T cell subsets. Subsequently, these datasets were rigorously analyzed using adaptive least absolute shrinkage and selection operator in conjunction with leave-one-out cross-validation. This approach enhanced analysis robustness, identifying T cell clusters consistently discriminative between SP-RA and SN-RA.

Results: Our analysis revealed significant differences in T cell subsets between RA patients and healthy controls, including elevated levels of activated T cells (CD3+, CD4+, CD8+) and helper subsets (Th1, Th17, Th17.1, and Tph cells). The Tph/Treg ratio was markedly higher in SP-RA, underscoring an effector-dominant immune imbalance. FlowSOM-based clustering identified 44 unique T cell clusters, six of which were selected as discriminative T cell clusters (D-TCLs) for distinguishing SP-RA from SN-RA. TCL21, an activated Th1-type Tph-like cell, was strongly associated with SP-RA's aggressive profile, while TCL02, a central memory CD4+ T cell subset, displayed ICOS+, CTLA-4low+, PD-1low+, and CXCR3+, providing insights into immune memory mechanisms. Additionally, TCL31 and TCL35, both CD4-CD8- T cells, exhibited unique phenotypes: CD161+ for TCL31 and HLA-DR+CD38+TIM-3+ for TCL35, suggesting distinct pro-inflammatory roles. Support vector machine analysis (bootstrap n = 1000) validated the D-TCLs' discriminative power, achieving an accuracy of 86.2%, sensitivity of 85.7%, and specificity of 80.9%.

Conclusions: This study advances our understanding of immunological distinctions between SP-RA and SN-RA, identifying key T cell phenotypes as potential targets for SP-RA disease progression. These findings provide a basis for studies on targeted therapeutic strategies tailored to modulate the markers and improve treatment for SP-RA.

Keywords: FlowSOM; T cell biomarker; anticyclic citrullinated peptide antibodies; mass cytometry; peripheral helper T cell; rheumatoid arthritis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Integrative analysis workflow: from T cell profiling to discriminative cluster identification. The figure provides a comprehensive illustration of the study’s workflow, including all major datasets and abbreviations: FSM-TCL-DS (FlowSOM T cell cluster dataset: 44 clusters from all CD3+ T cells), FSM-ATCL-DS (FlowSOM activated T cell cluster dataset: 12 clusters from activated CD38+HLA-DR+ T cells), gating-TCS-DS (manually gated T cell subset dataset), D-TCLs (discriminative T cell clusters, defined as those selected by adaptive LASSO in >50% of LOOCV iterations), and ATCLs (activated T cell clusters). The relationships between datasets, feature selection, and validation steps are depicted. Starting with the collection of peripheral blood from 50 participants, including 16 patients with untreated SP-RA, 17 patients with SN-RA, and 17 healthy controls, T cells were stained for 25 markers and analyzed using mass cytometry. Initial data segmentation was achieved through manual gating of T cell subsets, followed by an advanced clustering using the FlowSOM algorithm, which created two datasets: one for all CD3+ T cells and another focusing on activated T cell subsets. These datasets facilitated the detailed examination and identification of unique T cell clusters. The number of clusters for FSM-TCL-DS and FSM-ATCL-DS was determined empirically, based on biological interpretability and hierarchical merging criteria. Subsequently, the adaptive LASSO method was applied 33 times with leave-one-out cross-validation (LOOCV), with inverse probability weighting (IPW) in each cycle for background adjustment. Clusters selected as non-zero coefficients in >50% of LOOCV cycles were defined as discriminative T cell clusters (D-TCLs). All model parameters and feature selection criteria were established a priori, and no post hoc optimization was performed, in order to minimize bias and overfitting. This analysis highlighted six D-TCLs critical for distinguishing between SP-RA and SN-RA. The identified clusters were further validated using a support vector machine (SVM) with extensive bootstrap analysis, demonstrating their significance in differentiating disease states. This integrative approach underscores the potential of detailed T cell phenotyping in uncovering nuanced immunological differences between RA subtypes and guiding targeted therapeutic strategies.
Figure 2
Figure 2
Comparative analysis of T cell subsets in seropositive and seronegative rheumatoid arthritis. The figure illustrates the proportions and ratios of various T cell subsets, excluding T helper (Th) cells and regulatory T cells (Tregs), in patients with seropositive rheumatoid arthritis (SP-RA), seronegative rheumatoid arthritis (SN-RA), and healthy controls (HCs). The analysis focuses on the relative prevalence of these subsets and their ratios, highlighting differences in immune profiles among the groups. Dot plots, violin plots, and overlaid box plots are used to display the data, showing the distribution within each group. The box plots highlight the median (indicated by a white dot) and interquartile ranges, providing a summary of the data distribution alongside the individual data points shown by the dot plots. (A) Core T cell subsets and (B) activated T cell subsets. Differences between groups were tested for statistical significance using the Mann–Whitney U test. FDR-adjusted q-values were calculated separately for the core T cell subsets (panel A) and activated T cell subsets (panel B) using the Benjamini–Hochberg method. Statistical significance is indicated by * (p < 0.05) and † (q < 0.05). Both p-values and q-values are shown for each comparison.
Figure 3
Figure 3
Comparative analysis of T helper and regulatory T cell profiles in seropositive and seronegative rheumatoid arthritis. (A) depicts the proportions of effector T helper (Th) cells and regulatory T cells (Tregs) in patients with seropositive rheumatoid arthritis (SP-RA), seronegative rheumatoid arthritis (SN-RA), and healthy controls (HCs). (B) examines the ratios of circulating Th1/Treg, Th2/Treg, Th17/Treg, Th17.1/Treg, and Tph cell/Treg in CD3+ T cells across three groups: SP-RA, SN-RA, and HCs. The analysis focused on the relative prevalence of these ratios, indicating differences in immune regulation across the groups. Data are presented using dot plots, violin plots, and overlaid box plots, illustrating the distribution within each group. The box plots emphasize the median (indicated by a white dot) and interquartile ranges, providing a concise summary of the data distribution while also highlighting individual data points with dot plots. Statistical significance of observed differences was assessed using the Mann–Whitney U test, and FDR-adjusted q-values (Benjamini–Hochberg method) were calculated within the Th/Treg/ratio subset group. Statistical significance is indicated by * (p < 0.05) and † (q < 0.05). Both p-values and q-values are shown for each comparison.
Figure 4
Figure 4
Correlation analysis of T cell subsets and clinical characteristics in rheumatoid arthritis (RA) (n = 33). The figure shows a correlation coefficient matrix (Spearman’s ρ) between T cell subset frequencies, Th/Treg ratios, and clinical background factors such as age, sex, symptom duration, and ACPA positivity. For compositional variables (e.g., T cell clusters, T cell subset frequencies), correlations were calculated using CLR-transformed values; for ratios and other non-compositional variables, raw values were used. Each cell in the matrix indicates the strength and direction of the correlation, on a scale of −1 (strong negative correlation) to +1 (strong positive correlation), represented by a color gradient from red to blue. FDR correction (Benjamini–Hochberg method) was applied for multiple testing. Significance levels are indicated within each cell using asterisks to denote Benjamini–Hochberg adjusted q-values: *q < 0.1, **q < 0.05, ***q < 0.01, and ****q < 0.001.
Figure 5
Figure 5
Adaptive LASSO-driven selection and visualization of discriminative T cell clusters in seropositive and seronegative rheumatoid arthritis. (A) Predicted probability of ACPA positivity for each patient. The y-axis shows the IPW-adjusted probability of ACPA positivity as estimated by the adaptive LASSO model using leave-one-out cross-validation (LOOCV). The x-axis represents patient IDs, indexed such that actual SN-RA cases are labeled from 1 to 17 and actual SP-RA cases from 18 to 33. Each point represents an individual patient, with predicted probabilities derived from the model trained on all other patients (see Methods for details). Predictions are adjusted using inverse probability weighting (IPW) based on patient background variables, such as sex, age, symptom duration, NSAID usage, and DAS28-CRP. A horizontal reference line at the 0.5 probability threshold clearly differentiates predictions above (ACPA-positive) from those below (ACPA-negative), providing a direct visual comparison of predicted versus actual ACPA status. (B) Predictive accuracy across datasets. The bar graph presents the predictive accuracy of the adaptive LASSO model the FSM-TCL-DS, FSM-ATCL-DS, and gating-TCS-DS datasets. It shows the proportion of samples correctly predicted as SP-RA or SN-RA, demonstrating the effectiveness of the model. The FSM-TCL-DS dataset achieved the highest predictive accuracy (81.8%), underscoring its utility in model validation. (C) Frequency of selection for T cell variables across datasets. The graph illustrates how frequently different T cell variables were selected as significant discriminators between SP-RA and SN-RA, across multiple rounds of LOOCV. Variables consistently selected in >50% of the rounds are defined as discriminative T cell clusters (D-TCLs), with clusters such as TCL02, 21, 24, 31, 32, and 35 identified as particularly influential. (D) Coefficients of T cell variables from adaptive LASSO analysis. The plot displays the coefficients assigned to various T cell variables through adaptive LASSO analysis performed on the entire dataset of 33 patients with rheumatoid arthritis, illustrating the relative importance of each variable in distinguishing between patient groups.
Figure 6
Figure 6
Marker expression profiles of discriminative T cell clusters (D-TCLs) identified between seropositive (SP-RA) and seronegative rheumatoid arthritis (SN-RA). (A) Heatmap displaying the expression levels of all measured surface markers across the six D-TCLs, using a unified global color scale (see color bar). This enables direct quantitative comparison of marker expression across clusters. The heatmap is extracted from the comprehensive 44-cluster heatmap shown in Supplementary Figure 5C . (B) Cell diagram representation of the same D-TCLs, where T cell surface markers are depicted as soft-edged rectangles, and colors correspond to their expression levels (low: navy, high: red). This diagram provides an intuitive overview of the phenotypic profiles within each cluster. Presenting both the heatmap and the cell diagram allows for both precise quantitative comparison and rapid visual assessment of the marker expression patterns of D-TCLs. The full heatmap of all 44 clusters is provided in Supplementary Figure 5C .
Figure 7
Figure 7
Comparative analysis of key T cell clusters in rheumatoid arthritis subtypes and healthy controls. (A) Non-weighted distribution of key T cell clusters (TCLs). The panel presents a combination of violin and dot plots illustrating the percentage distribution of selected T cell clusters within CD3+ T cells across three groups: seropositive RA (SP-RA), seronegative RA (SN-RA), and healthy controls (HCs). The plot features eight key T cell clusters, including six discriminative T cell clusters (D-TCLs): TCLs 02, 21, 24, 31, 32, and 35, in addition to TCL 10 and TCL 29, which have been identified through correlation analysis as having significant negative associations with ACPA positivity. The plots provide a visual comparison of the frequencies of these TCLs, highlighting variations between HCs and the combined RA groups (SN-RA and SP-RA) as well as directly between the SP-RA and SN-RA groups. Statistical significance of the differences was assessed using the Mann–Whitney U test, with symbols indicating levels of significance: * p < 0.05, ** p < 0.01, and p < 0.005. (B) Weighted scatter plot of key T cell clusters in patients with RA. The panel features a weighted scatter plot showing the distribution of the same eight key T cell clusters (TCLs), specifically among patients with RA, divided into the SP-RA and SN-RA groups. The sizes of the points are proportional to inverse probability weighting (IPW), which adjusts for patient background factors, such as age, sex, symptom duration, DAS28-CRP, and NSAID usage. Weighted median values for each TCL are depicted with horizontal bars. The significance of the differences, assessed using the weighted Mann–Whitney test adjusted for IPW, is marked by * p < 0.05, ** p < 0.01, and p < 0.005, providing detailed visualization of intercluster variability.].
Figure 8
Figure 8
Internal performance assessment of discriminative T cell clusters (D-TCLs) for distinguishing seropositive and seronegative rheumatoid arthritis (SP-RA and SN-RA) using bootstrap-supported SVM modeling. Bootstrap validation (n = 1000 iterations) was performed by randomly dividing each sample into training and test sets, with SVM hyperparameters (cost and gamma) optimized via grid search. All validation and performance assessment were conducted internally; external validation using independent data remains necessary to fully establish generalizability. Performance metrics—including accuracy, area under the receiver operating characteristic curve (AUC-ROC), F1 score, negative predictive value (NPV), positive predictive value (PPV), sensitivity, and specificity—were computed for each bootstrap iteration. (A) Mean ROC curve and 95% confidence interval. The mean ROC curve (blue line) and its 95% confidence interval (shaded area) are shown (mean AUC-ROC = 0.960, 95% CI: 0.746–1.000). (B) Distribution of classification performance metrics. Violin and box plots summarize the distributions of accuracy, F1 score, sensitivity, specificity, PPV, and NPV across bootstrap samples; mean values are indicated by red dots. The results confirm the high discriminative power of D-TCLs, with an average accuracy of 86.2% (95% CI: 62%–100%), sensitivity of 85.7%, specificity of 80.9%, PPV of 82.3%, NPV of 87.4%, and F1 score of 0.823.

References

    1. Smolen JS, Aletaha D, Barton A, Burmester GR, Emery P, Firestein GS, et al. Rheumatoid arthritis. Nat Rev Dis Primers. (2018) 4:1–23. doi: 10.1038/nrdp.2018.1, PMID: - DOI - PubMed
    1. Aletaha D, Smolen JS. Diagnosis and management of rheumatoid arthritis: a review. JAMA. (2018) 320:1360–72. doi: 10.1001/jama.2018.13103, PMID: - DOI - PubMed
    1. Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT, Bingham CO, et al. 2010 rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Ann Rheum Dis. (2010) 69:1580–8. doi: 10.1136/ard.2010.138461, PMID: - DOI - PubMed
    1. Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT, Bingham CO, et al. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. (2010) 62:2569–81. doi: 10.1002/art.27584, PMID: - DOI - PubMed
    1. Malmström V, Catrina AI, Klareskog L. The immunopathogenesis of seropositive rheumatoid arthritis: from triggering to targeting. Nat Rev Immunol. (2017) 17:60–75. doi: 10.1038/nri.2016.124, PMID: - DOI - PubMed