Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2023 Nov 28;14(1):7799.
doi: 10.1038/s41467-023-43372-2.

Data-driven grading of acute graft-versus-host disease

Affiliations
Multicenter Study

Data-driven grading of acute graft-versus-host disease

Evren Bayraktar et al. Nat Commun. .

Abstract

Despite advances in allogeneic hematopoietic cell transplantation, acute graft-versus-host disease (aGVHD) remains its leading complication, yet with heterogeneous outcomes. Here, we analyzed aGVHD phenotypes and clinical classifications in depth in large, multicenter cohorts involving 3019 patients and addressed prevailing gaps by developing data-driven models. We compared, tested and verified these along with all conventional classifications in independent cohorts and found that data-driven grading outperformed conventional grading in Akaike information criterion and concordance index metrics. Data-driven classifications refined aGVHD assessment with up to 12 severity grades, which were associated with distinct nonrelapse mortality (NRM) and confirmed the key role of intestinal aGVHD. We developed an online calculator for physicians to implement principal component-derived grading (PC1). These results provide substantial insight into the evaluation of aGVHD phenotypes and multiorgan involvement, which relegates the exclusive reporting of overall aGVHD severity grades in transplant registries and clinical trials. Data-driven aGVHD grading provides an expandable platform to refine classification and transplant risk assessment.

PubMed Disclaimer

Conflict of interest statement

The authors of this manuscript have potential competing interests to disclose. A.T.T. Consultancy for CSL Behring, Maat Pharma, Biomarin and Onkowissen. E.B. is an employee of Bayer AG at the time of publication of this manuscript, research was conducted before his engagement at Bayer and without the involvement of Bayer. G.B. reported no conflicts directly related to this work. G.B. collaborates with Jazz Pharmaceuticals, Shionogi, and Medac in clinical trials. O.P. has received honoraria or travel support from Gilead, Jazz, MSD, Novartis, Pfizer, and Therakos. He has received research support from Incyte and Priothera. He is a member of advisory boards to Equillium Bio, Jazz, Gilead, Novartis, M.S.D., Omeros, Priothera, Sanofi, Shionogi, and SOBI. D.W.B. received travel subsidies from Medac, all outside the submitted work. H.C.R. received consulting and lecture fees from Abbvie, AstraZeneca, Vertex, Novartis, and Merck. H.C.R. received research funding from Gilead Pharmaceuticals and AstraZeneca. H.C.R. is a co-founder and shareholder of CDL Therapeutics GmbH. O.P. has no conflicts directly related to this work. The other authors report no competing interests.

Figures

Fig. 1
Fig. 1. Overview of data-driven aGVHD grading development, validation, external test and verification in comparison to conventional grading.
a Data preparation: Data was assembled from a multicenter cohort (Berlin, Essen, Hamburg, Heidelberg, Hannover) with HCT between 2008 and 2018 and split into independent training (n = 2319) and test cohorts (n = 700). b Data-driven aGVHD classification. Input data from the aGVHD target organ involvement (skin, GI and liver) was organized in a 3D space and the following data-driven methods were applied: Principal component analysis (PCA) for linear mapping of PC1 and severity indexing, as well as hierarchical- and k-means clustering. For comparability with conventional grading, the number of clusters was set to 4. The nonlinear methods Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (tSNE) were used to visualize grading in 2-dimensional space. Non-linear methods and their results are detailed in the supplement. c Evaluation, validation, external test, and verification of data-driven aGVHD grading. PCA was internally validated via 500-fold bootstrapping of 1546 randomly selected data points (2/3 of training cohort). During k-means clustering, the optimal cluster number was determined using the elbow method on the sum of squared distances (SSD) and silhouette index. All grading systems were externally tested on independent multicenter data. Akaike information criterion (AIC) as well as the concordance index (Ci) were calculated to verify and compare data-driven and conventional grading. Time-dependent AUROC curves (Area under the receiver operating characteristic curves) were generated to visualize specificity and sensitivity for 12-month OS and NRM. Distribution plots and Cohen’s kappa analysis compared the distribution of the different phenotypes (organ-stage combinations) and intergrading agreement. Kaplan–Meier OS and cumulative incidence NRM curves were computed with 95% confidence intervals (CI) to compare associations of different grading systems with outcome. P-values were calculated using a two-sided log-rank test (Kaplan–Meier OS) or two-sided Gray test (NRM curves). Created with BioRender.com. The organs image in panel b is adapted from https://pixabay.com/illustrations/offal-marking-medical-colon-liver-1463369/ via Elionas2 under the Content License.
Fig. 2
Fig. 2. Exploratory data analysis of aGVHD training (n = 2319) and test cohorts (n = 700) shows adequate cohort coverage.
a Pair plot from kernel density estimation of the training cohort (n = 2319) plotting the clinical target organ stages of skin, liver and GI involvement (stages 0-4, left to box and below). The target organ stage correlations are presented as density plots. Patient numbers (n) of each subgroup are indicated right in each box. A higher n in each subgroup is shown by greater surface coverage. Density of aGVHD target organ combinations is indicated from light green to dark blue. b Target organ stage correlation matrix (Spearman) of the training cohort shows the distribution of single variables skin, liver and GI and their respective interactions. Range from −1.0 to +1.0, dark blue indicates full overlap. c Pair plot and d Target organ stage correlation matrix (Spearman) of the test cohort (n = 700). Analysis, labels and colors as in (a and b).
Fig. 3
Fig. 3. Principal component analysis of the training cohort (n = 2319) and transformation of principal component 1 into PC-grading of aGVHD.
a Biplot of principal components 1 (PC1) and 2 (PC2) on each axis displays the scores and loading vectors of principal component analysis (PCA). Arrows indicate the importance of each target organ involvement for PC1 and PC2, respectively. b Scree plot of PC1, PC2 and PC3. The proportion of variance explained by PC1 is the greatest with 0.47. c Explorative plotting of PC1 against overall survival (OS, days from HCT, censoring has not been considered in this representation) indicates lower long-term OS with increasing PC1. Each dot represents one patient with aGVHD. Colors representing MAGIC aGVHD grade I–IV (I = yellow, II = green, III = blue, IV = violet) indicate the overlap of different MAGIC grades. d Transformation of PC1 results into an aGVHD classification (ranging from PC1-stage 1–12), results plotted against OS, as in (c). Lighter colors (yellow) indicate shorter observation, darker colors (blue) higher long-term OS. e Kaplan-Meier estimate OS curve with 95% CI of 4 PC-aGVHD grades (I–IV) consolidated from PC-aGVHD-stages 1–12. The colors indicate lower (yellow) to higher (blue) OS. Strata are compared with the two-sided log-rank test. f Plotting of PC1 stages against aGVHD organ involvement (combinations: Skin: only skin; liver: only liver; GI: only GI; skin and liver; skin and GI; liver and GI; skin, liver and GI). The circle size corresponds to the n of patients in each category.
Fig. 4
Fig. 4. Hierarchical and partitional clustering of the training cohort (n = 2319) as alternative data-driven approaches to aGVHD grading and multivariate competing-risk-regression of the validation cohort (n = 700) with the PC1 grading.
a Agglomerative hierarchical clustering (HClust) dendrogram of the training cohort on the basis of their target organ involvements. An HClust distance threshold of 30 split the cohort into 4 clusters, which were numerically ordered. Grade I: n = 763; II: n = 1149, III: n = 191, IV: n = 216. Red dashed line indicates cutoff level for four grades. b Kaplan–Meier OS curve with 95% CI of 4 HClust-aGVHD grades (I–IV). Strata are compared with the two-sided log-rank test. c K-means partitional clustering performance indicators SSD (sum of squared distances, green dashed) and silhouette coefficient (green), labels on each figure side. Red dashed line indicates cutoff level with four grades; gray dashed line shows cutoff level with 8 grades, the optimal number determined by both methods (n = 8, Sil = 0.62). We evaluated a further cutoff point with 14 clusters in the supplementary notes. d Kaplan–Meier OS curve with 95% CI of K-means-4 grades (I–IV). Strata are compared with the two-sided log-rank test. e Multivariate competing risk regression analysis for 12 months NRM on the test cohort (n = 541 evaluable for all covariates) using the PC1 aGVHD grades as a time-dependent variable. The multivariate model was adjusted for potentially confounding variables, covariates as listed in e. Horizontal bars represent 95% CI. P-values are computed based on the Wald-test. The hazard ratio (HR) is a measure of the ratio of the hazard between two groups. A value of 1 is the reference, HR < 1 corresponds to lower risk and HR > 1 to higher risk of NRM than the reference. The HR of PC1 grade II was 2.12 (95% confidence interval, CI, 1.17-3.83, grade III HR 7.2 (95%CI 4.72-10.99) and grade IV HR 16.30 (95%CI 8.12-32.75). Significant covariates in this NRM model were diagnoses (acute lymphoblastic leukemia (ALL) HR 2.3 (95%CI 1.17–4.66), myelodysplastic syndromes (MDS) HR 1.74 (95%CI 1.06–2.84), other diagnoses HR 3.8 (95% CI 1.27–11.88), year of HCT HR 0.91, 95% CI 0.85–0.97, and EBMT risk score HR 1.40, 95%CI 1.21–1.63. The covariates, donor age, donor sex, donor type, Karnofsky performance index ≥80 were not significant in univariate regression analysis and hence not included in the multivariate model. Source data are provided as a source data file.
Fig. 5
Fig. 5. Comparative visualization of data-driven and conventional grading methods on the independent test cohorts (n = 700).
Comparison of OS between aGVHD classifications using four grades (both conventional and data-driven grading), Cohen’s analysis of intergrading agreement and assessment of the respective predictive values using AUROC. a–f: Kaplan–Meier OS curves with 95% CI of patients in the independent test cohorts (n = 700). OS is stratified by aGVHD grading severity according to the relevant grading system from I to IV and strata are compared with the two-sided log-rank test. a PC1-aGVHD grading, b Hierarchical clustering grading (Hclust) c K-means clustering grading with 4 grades d MAGIC grading, e Consensus grading, f IBMTR grading. g Comparison of AUROC for 12 months OS between grading systems. AUROC values range from 0.5 to 1.0. h Comparison of AUROC curves for 12 months NRM. I: Cohen’s Kappa comparing the intergrading agreement of different grading systems (PC1, Hclust, K-means, MAGIC, Consensus, IBMTR, and Minnesota). Ranges from 0 (no agreement, light green) to 1 (full agreement, dark blue).
Fig. 6
Fig. 6. Comparative distribution analysis of aGVHD grading methods on the independent test cohort (n = 700).
Data-driven aGVHD grading methods are compared to MAGIC conventional grading to reveal differences in patient proportions, organ combinations involved in each grade and ability to dissect into cohorts with significantly distinct NRM. a Cumulative incidence NRM curves according to MAGIC grades. Separation of curves is tested by the two-sided Gray test. Error bands represent 95% CI b Pie chart of aGVHD grades in MAGIC grading. The angle of each slice is proportional to the number of organ stage combinations in the respective grade. The radius of each slice represents the number of patients within this grade. MAGIC grade I: 2 combinations and 265 patients, II: 11 combinations and 183 patients, III: 32 combinations and 143 patients, IV: 26 combinations and 109 patients. c Patient phenotype distribution within each grade. Stacked bar chart of MAGIC aGVHD grades showing the proportion of patients in each organ stage combination. For each bar the color represents one combination, no crossover between grades. All phenotypes are detailed in Supplementary Data 1. d Cumulative incidence NRM curves according to PC1 grading with 4 grades. Separation of curves is tested by two-sided Gray test. Error bands represent 95% CI e Pie chart of PC1 grades. f Stacked bar chart of PC1 grades. The phenotypes are detailed in Supplementary Data 2. g NRM of Hclust grading with four grades. h Pie chart of Hclust grades. i Stacked bar chart of Hclust grades. The phenotypes are detailed in Supplementary Data 3. j Cumulative incidence NRM curves according to K-means grading using 4 grades. Separation of curves is tested by a two-sided Gray test. Error bands represent 95% CI k Pie chart of K-means grades. l Stacked bar chart of K-means grades. The phenotypes are detailed in Supplementary Data 4. Source data for b, c, e, f, h, i, k and l are provided as a source data file.
Fig. 7
Fig. 7. Clinical outcome analysis of re-distributed patients between data-driven and conventional grading systems.
Redistributed patients from one severity category to another between different grading systems are compared to the remaining patients in the original category. a Kaplan–Meier OS curves with 95% CI of redistributed patients from MAGIC grade III to PC1-grade ≤II (light green) compared to intersection grade III patients in both grades (dark green). Strata are compared with the two-sided log-rank test. The phenotypes are detailed in Supplementary Data 8. b Comparison of Kaplan-Meier OS curves with 95% CI of redistributed patients from Consensus grade III to PC1-grade ≤ II (light green) to intersection of grade III patients in both consensus and PC1 (dark-green). Strata are compared with the two-sided log-rank test. c Kaplan–Meier OS curves with 95% CI of redistributed patients from MAGIC grade III to PC1-grade I (light green) and PC1-grade II (green) are compared to grade III patients in both MAGIC and PC1 (blue-green). d Kaplan–Meier OS curves with 95% CI of redistributed patients from Consensus grade III to PC1-grade I (light green), to PC1-grade II (green) are compared to the intersection of grade III patients (blue-green). e Cumulative incidence curves of NRM are compared for the same strata as in c. Error bands show 95% CI. f Cumulative incidence curves of NRM are compared for redistributed Consensus grade III patients to PC1 including PC1-grade IV (dark blue). Error bands show 95% CI. Strata for NRM are compared with the two-sided Gray test.
Fig. 8
Fig. 8. Outcome analysis according to aGVHD target organ severity in the test cohort (n = 700).
Patients in the test cohort (n = 700) were stratified according to target organ severity staging. a and b Kaplan–Meier OS and cumulative incidence of NRM of patients stratified by aGVHD skin stage 0–4. Error bands represent 95% CI. Separation of curves is tested by the two-sided log-rank test (OS) or two-sided Gray test (NRM). c-d Kaplan–Meier OS and cumulative incidence of NRM of patients stratified by aGVHD liver stage 0–4. Error bands represent 95% CI. Separation of curves is tested by the two-sided log-rank test (OS) or two-sided Gray test (NRM). e and f Kaplan–Meier OS and cumulative incidence of NRM of patients stratified by aGVHD GI stage 0–4. Error bands show 95% CI. Separation of curves is tested by the two-sided log-rank test (OS) or two-sided Gray test (NRM).
Fig. 9
Fig. 9. Comparative distribution analysis of aGVHD grading systems refined beyond four grades on the independent test cohort (n = 700).
Additional data-driven aGVHD gradings with more than 4 grades are compared to reveal differences in patient proportions, organ combinations involved in each grade and their ability to dissect into cohorts with significantly distinct NRM. a Cumulative incidence NRM curves of aGVHD grades in PC1 grading using all 12 PC1 stages as distinct severity grades. Colors representing PC1 aGVHD grade I–XII. Separation of curves is tested by the two-sided Gray test. b Pie chart according to PC1 with 12 grades. The angle of each slice represents the number of organ stage combinations in the respective grade. The radius of each slice represents the number of patients within this grade. c Stacked bar chart of PC1 with 12 grades showing the proportion of patients in each organ stage combination. For each bar, one color represents one combination, no crossover between grades. The phenotypes are detailed in Supplementary Data 9. d Cumulative incidence NRM curves according to PC1 grading with 6 grades. Colors representing PC1 aGVHD grade I–VI. Separation of curves is tested by the two-sided Gray test. e Pie chart of PC1 with six grades. f Stacked bar chart of PC1 with six grades. The phenotypes are detailed in Supplementary Data 10. g Cumulative incidence NRM curves according to K-means grading with eight grades, using the optimal number of clusters as determined by elbow method on the development cohort. Colors representing K-means aGVHD grade I–VIII Separation of curves is tested by the two-sided Gray test. h Pie chart of K-means-8 grades i Stacked bar chart of K-means-8 grades. The phenotypes are detailed in Supplementary Data 11. Source data for b, c, e, f, h and i are provided as a source data file.
Fig. 10
Fig. 10. Comparison of classification performances using Akaike information criterion and concordance index.
a Bar plot visualization of the Akaike information criterion (AIC) of all aGVHD grading combinations in decreasing order. If not otherwise mentioned, patients are categorized into four grades. Lower AIC results are preferable. b Bar plot visualization of the concordance index (c-index/CI) of all aGVHD grading combinations in increasing order. Higher c-index values are preferable. c AIC plotted versus CI for all analyzed aGVHD classification methods. Correlation (r) calculated via linear regression. 95% CI: −0.98 to 0.81 ****p < 0.0001 (two-tailed). As the clustering-based grading systems did not cover the phenotype constellation of n = 6 patients in the test cohort, the comparison between systems was performed among the remaining 694 patients. Source data are provided as a source data file.

References

    1. Penack O, et al. Prophylaxis and management of graft versus host disease after stem-cell transplantation for haematological malignancies: updated consensus recommendations of the European Society for Blood and Marrow Transplantation. Lancet Haematol. 2020;7:e157–e167. doi: 10.1016/S2352-3026(19)30256-X. - DOI - PubMed
    1. Glucksberg H, et al. Clinical manifestations of graft-versus-host disease in human recipients of marrow from HL-A-matched sibling donors. Transplantation. 1974;18:295–304. doi: 10.1097/00007890-197410000-00001. - DOI - PubMed
    1. Przepiorka D, et al. 1994 Consensus conference on acute GVHD grading. Bone Marrow Transpl. 1995;15:825–828. - PubMed
    1. Rowlings PA, et al. IBMTR Severity Index for grading acute graft-versus-host disease: retrospective comparison with Glucksberg grade. Br. J. Haematol. 1997;97:855–864. doi: 10.1046/j.1365-2141.1997.1112925.x. - DOI - PubMed
    1. Cahn JY, et al. Prospective evaluation of 2 acute graft-versus-host (GVHD) grading systems: a joint Société Française de Greffe de Moëlle et Thérapie Cellulaire (SFGM-TC), Dana Farber Cancer Institute (DFCI), and International Bone Marrow Transplant Registry (IBMTR) prospective study. Blood. 2005;106:1495–1500. doi: 10.1182/blood-2004-11-4557. - DOI - PMC - PubMed

Publication types

MeSH terms