Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 1;8(1):563.
doi: 10.1038/s41746-025-01962-y.

Deep hierarchical subtyping of multi-organ systemic sclerosis trajectories - a EUSTAR study

Collaborators, Affiliations

Deep hierarchical subtyping of multi-organ systemic sclerosis trajectories - a EUSTAR study

Cécile Trottet et al. NPJ Digit Med. .

Abstract

Systemic sclerosis (SSc) is a chronic autoimmune disease with multi-organ involvement. Historically, SSc classification has focused on the type of skin involvement (limited versus diffuse); however, a growing evidence of organ-specific variability suggests the presence of more than two distinct subtypes. We propose a semi-supervised generative deep learning framework leveraging expert-driven definitions of organ-specific involvement and severity. We model SSc disease trajectories in the European Scleroderma Trials and Research (EUSTAR) database, containing 14,000 patients across 67,000 medical visits, and identify clinically meaningful subtypes to enhance patient stratification and prognosis. We systematically evaluate the model's predictive accuracy, robustness to missing data, and clinical interpretability. We identified five patient clusters, separating patients based on the degree of organ involvement. Notably, a subset with limited skin involvement still showed high risks of lung and heart complications, underscoring the importance of data-driven methods and multi-organ models to complement established insights from clinical practice.

PubMed Disclaimer

Conflict of interest statement

Competing interests: A.H. has/had consultancy relationship with and/or has received research funding from or has served as a speaker for the following companies in the area of potential treatments for systemic sclerosis and its complications in the last 36 months: Abbvie, Avalyn, CallunaPharma, BMS, Boehringer Ingelheim, Genentech, Janssen, Merck Sharp&Dohme, Medscape, Novartis, Pliant therapeutics, Roche and Werfen. A.H. is a CTD-ILD ERS/EULAR convenor and a EULAR study group leader on the lung in rheumatic and musculoskeletal diseases.OD has/had consultancy relationship with and/or has received research funding from or has served as a speaker for the following companies in the area of potential treatments for systemic sclerosis and its complications in the last two years: 4P-Pharma, Abbvie, Acepodia, Aera, AnaMar, Anaveon AG, Argenx, Boehringer Ingelheim, BMS, Calluna, Cantargia AB, Citus AG, CSL Behring, Galderma, Galapagos, Hemetron AG, Innovaderm, Lilly, MSD Merck, Mitsubishi Tanabe; Nkarta Inc., Orion, Pilan, Quell, Scleroderma Research Foundation, EMD Serono, Topadur and UCB. Patent issued “mir-29 for the treatment of systemic sclerosis” (US8247389, EP2331143). OD is a co-founder of CITUS AG. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the study pipeline.
1. Variable selection process and database preprocessing. 1. A We first screened the medical literature to identify clinical definitions of involvement and severity for each studied organ, and extracted the relevant variables X from the database. 1. B Next, a steering committee of 10 rheumatologists reached an expert consensus to select the most relevant clinical definitions, yielding a more restricted subset of variables GX. 1. C Patient data is collected from various EUSTAR-affiliated centers and aggregated by the EUSTAR group. The database is preprocessed and is randomly split into an 85% training set, used for model development and hyperparameter tuning, and a 15% test set for hold-out evaluation. 2. Semi-supervised model architecture. The encoder network processes longitudinal clinical measurements, x1:t up to a time-point t, concatenated with the corresponding missingness indicator mask m1:t, and static patient demographic information s. It learns the distribution of the full latent trajectory z1:T, where T is the time of the last available visit in the registry. 2. A The guidance decoders, each assigned to a specific variable in G, take as input a predefined allocated subset of the dimensions from a sampled z1:T (one allocated subset per organ) and predict the distribution of the corresponding medical variables. 2. B The unsupervised decoder takes a sampled z1:T (all dimensions) and is trained to reconstruct the input x1:t. 3. Hierarchical clustering for disease subtyping in the learned latent space. Our method first divides the cohort into two main clusters—mild and severe trajectories—then further subdivides the mild cluster into two subtypes and the severe cluster into three subtypes. Abbreviations: Long Short-Term Memory Network (LSTM), Multilayer Perceptron (MLP).
Fig. 2
Fig. 2. Ground truth versus reconstructed data.
UMAP decompositions of the latent space are overlaid, respectively, with ground truth values (left) and model-reconstructed values (right) for lung fibrosis features. Plotted data points correspond to values that were masked (not provided to the model), demonstrating its ability to impute missing information.
Fig. 3
Fig. 3. Regions of the latent space.
Latent space UMAP decomposition overlaid with reconstructed feature values.
Fig. 4
Fig. 4. First hierarchy of clusters.
The cohort is divided into mild (green) and severe (purple) disease trajectories. Below the UMAPs, we show the average label values over time in each cluster. The label trajectories highlight lung and skin involvement as key differentiators.
Fig. 5
Fig. 5. Second hierarchy of clusters.
A Mild Disease Subtypes. Patients with milder disease trajectories are further divided into two subtypes. The dark green cluster shows slightly higher probabilities of skin, heart, and GT involvement compared to the pale green cluster. B Severe Disease Subtypes. Patients with severe disease trajectories are subdivided into three subtypes: pale blue, dark blue, and red. The pale blue cluster is marked by severe skin involvement; the dark blue cluster by pronounced heart and lung involvement; and the red cluster by combined skin, GT, lung, and heart involvement.
Fig. 6
Fig. 6. Top eight features ranked by their variation across the five final clusters.
Larger bars indicate greater feature variability, and the error bars show differences across the five cross-validated models. A Standard deviation of continuous and ordinal feature values across clusters. B Standard deviation of empirical class probabilities for binary features across clusters.
Fig. 7
Fig. 7. Clinical decision support system.
A The model predicts future latent trajectories and assigns patients to likely severity subtypes. For an index patient, it visualizes their latent trajectory and predicted disease progression (start at the X). B Similar trajectories to the index patient can be identified using k-nearest neighbors (start at the X). C Medical feature trajectories of the retrieved similar patients can be visualized and compared. D Organ involvement trajectories of these similar patients can also be visualized and compared.

References

    1. Denton, C. P. & Khanna, D. Systemic sclerosis. Lancet390, 1685–1699 (2017). - PubMed
    1. Del Galdo, F. et al. Eular recommendations for the treatment of systemic sclerosis: 2023 update. Ann. Rheum. Dis.84, 29–40 (2025). - PubMed
    1. Jaeger, V. K. et al. Incidences and risk factors of organ manifestations in the early course of systemic sclerosis: a longitudinal eustar study. PloS one11, e0163894 (2016). - PMC - PubMed
    1. Hoffmann-Vold, A.-M. et al. Setting the international standard for longitudinal follow-up of patients with systemic sclerosis: a delphi-based expert consensus on core clinical features. RMD open5, e000826 (2019). - PMC - PubMed
    1. Elhai, M. et al. Stratification in systemic sclerosis according to autoantibody status versus skin involvement: a study of the prospective eustar cohort. Lancet Rheumatol.4, e785–e794 (2022). - PubMed

LinkOut - more resources