Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2011 Nov;17(11):1619-29.
doi: 10.1016/j.bbmt.2011.04.002. Epub 2011 Apr 12.

A multicenter pilot evaluation of the National Institutes of Health chronic graft-versus-host disease (cGVHD) therapeutic response measures: feasibility, interrater reliability, and minimum detectable change

Affiliations
Multicenter Study

A multicenter pilot evaluation of the National Institutes of Health chronic graft-versus-host disease (cGVHD) therapeutic response measures: feasibility, interrater reliability, and minimum detectable change

Sandra A Mitchell et al. Biol Blood Marrow Transplant. 2011 Nov.

Abstract

The lack of standardized criteria for measuring therapeutic response is a major obstacle to the development of new therapeutic agents for chronic graft-versus-host disease (cGVHD). National Institutes of Health (NIH) consensus criteria for evaluating therapeutic response were published in 2006. We report the results of 4 consecutive pilot trials evaluating the feasibility and estimating the interrater reliability and minimum detectable change of these response criteria. Hematology-oncology clinicians with limited experience in applying the NIH cGVHD response criteria (n = 34) participated in a 2.5-hour training session on response evaluation in cGVHD. Feasibility and interrater reliability between subspecialty cGVHD experts and this panel of clinician raters were examined in a sample of 25 children and adults with cGVHD. The minimum detectable change was calculated using the standard error of measurement. Clinicians' impressions of the brief training session, the photo atlas, and the response criteria documentation tools were generally favorable. Performing and documenting the full set of response evaluations required a median of 21 minutes (range: 12-60 minutes) per rater. The Schirmer tear test required the greatest time of any single test (median: 9 minutes). Overall, interrater agreement for skin and oral manifestations was modest; however, in the third and fourth trials, the agreement between clinicians and experts for all dimensions except movable sclerosis approached satisfactory values. In the final 2 trials, the threshold for defining change exceeding measurement error was 19% to 22% body surface area (BSA) for erythema, 18% to 26% BSA for movable sclerosis, 17% to 21% BSA for nonmovable sclerosis, and 2.1 to 2.6 points on the 15-point NIH Oral cGHVD scale. Agreement between clinician-expert pairs was moderate to substantial for the measures of functional capacity and for the gastrointestinal and global cGVHD rating scales. These results suggest that the NIH response criteria are feasible for use, and these reliability estimates are encouraging, because they were observed following a single 2.5-hour training session given at multiple transplant centers, with no opportunity for iterative training and calibration. Research is needed to evaluate inter- and intrarater reliability in larger samples, and to evaluate these response criteria as predictors of outcomes in clinical trials.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Bland-Altman plots, comparing differences (Trial 4) between clinician and expert scoring of cutaneous and oral cGVHD manifestations (difference=clinician score minus expert score) plotted against expert scores. Negative differences reflect clinician underestimation of the extent of involvement, while positive differences reflect clinician overestimation of the extent of involvement, relative to the expert’s assessment. A change in the magnitude of the difference between clinician and expert assessments with increasing extent of cGVHD involvement, as assessed by the expert, is determined by looking for patterns along the x-axis.
Figure 2
Figure 2
Inter-rater Agreement Between Clinicians and Experts for Evaluation of Gastrointestinal Symptoms, Functional Performance, and cGVHD Global Scores. In the final two trials, substantial inter-rater agreement was observed in evaluating gastrointestinal symptoms (68% to 94% of pair-wise comparisons in perfect agreement), while moderate to substantial agreement was noted in evaluating the two minute walk (75% to 80% of pair-wise assessments were concordant), grip strength (60% to 77% of pair-wise assessments were concordant), cGVHD global severity (clinician-expert pairs in perfect agreement 50% to 75% of the time), and cGVHD evolution over the past month (clinician-expert pairs in perfect agreement 40% to 63% of the time). Note: Numbers were rounded so that values add to 100%

References

    1. Socie G, Ritz J, Martin PJ. Current challenges in chronic graft-versus-host disease. Biol Blood Marrow Transplant. 2010 Jan;16(1 Suppl):S146–151. - PubMed
    1. Martin PJ, Weisdorf D, Przepiorka D, et al. National Institutes of Health Consensus Development Project on Criteria for Clinical Trials in Chronic Graft-versus-Host Disease: VI. Design of Clinical Trials Working Group report. Biol Blood Marrow Transplant. 2006 May;12(5):491–505. - PubMed
    1. Pavletic SZ, Martin P, Lee SJ, et al. Measuring therapeutic response in chronic graft-versus-host disease: National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: IV. Response criteria working group report. Biol Blood Marrow Transplant. 2006 Mar;12(3):252–266. - PubMed
    1. Lee S, Cook EF, Soiffer R, Antin JH. Development and validation of a scale to measure symptoms of chronic graft-versus-host disease. Biol Blood Marrow Transplant. 2002;8(8):444–452. - PubMed
    1. Sanchez MM, Binkowitz BS. Guidelines for measurement validation in clinical trial design. J Biopharm Stat. 1999 Aug;9(3):417–438. - PubMed

Publication types

MeSH terms