Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 26:6:e48589.
doi: 10.2196/48589.

Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study

Affiliations

Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study

Andrew J McNeil et al. JMIR Dermatol. .

Abstract

Background: Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from nonexpert participants has been used for numerous medical applications, including image labeling and segmentation tasks.

Objective: This study aimed to assess the ability of crowds of nonexpert raters-individuals without any prior training for identifying or marking cGHVD-to demarcate photos of cGVHD-affected skin. We also studied the effect of training and feedback on crowd performance.

Methods: Using a Canfield Vectra H1 3D camera, 360 photographs of the skin of 36 patients with cGVHD were taken. Ground truth demarcations were provided in 3D by a trained expert and reviewed by a board-certified dermatologist. In total, 3000 2D images (projections from various angles) were created for crowd demarcation through the DiagnosUs mobile app. Raters were split into high and low feedback groups. The performances of 4 different crowds of nonexperts were analyzed, including 17 raters per image for the low and high feedback groups, 32-35 raters per image for the low feedback group, and the top 5 performers for each image from the low feedback group.

Results: Across 8 demarcation competitions, 130 raters were recruited to the high feedback group and 161 to the low feedback group. This resulted in a total of 54,887 individual demarcations from the high feedback group and 78,967 from the low feedback group. The nonexpert crowds achieved good overall performance for segmenting cGVHD-affected skin with minimal training, achieving a median surface area error of less than 12% of skin pixels for all crowds in both the high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters from the low feedback group for each image recovered a performance similar to that of the high feedback crowd. Higher variability between raters for a given image was not found to correlate with lower performance of the crowd consensus demarcation and cannot therefore be used as a measure of reliability. No significant learning was observed during the task as more photos and feedback were seen.

Conclusions: Crowds of nonexpert raters can demarcate cGVHD images with good overall performance. Tracking the top 5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. However, the agreement amongst individual nonexperts does not help predict whether the crowd has provided an accurate result. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations.

Keywords: artificial intelligence; cGVHD; crowdsourcing; dermatology; feasibility; graft-versus-host disease; imaging; labeling; medical image; segmentation; skin.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: EPD is the chief executive officer of Centaur Labs and holds shares in the company. KM is an employee of Centaur Labs.

Figures

Figure 1
Figure 1
Flowchart of the study design. cGVHD: chronic graft-versus-host disease; r: rater.
Figure 2
Figure 2
Annotation interfaces used for (A) ground truth demarcations using the Vectra Analysis Module and (B) crowd demarcations using the DiagnosUs app, including ground truth feedback during training.
Figure 3
Figure 3
The 8 images used for training the crowd during study enrollment. Ground truth demarcations of cGVHD-affected skin are shown in green. The corresponding text descriptions of each disease presentation are given in Table 3. cGVHD: chronic graft-versus-host disease.
Figure 4
Figure 4
Performance of crowd groups for demarcating images with cGVHD-affected skin per ground truth using the (A) Dice coefficient and (B) surface area error. Each point represents the majority vote mask for a single image (711 images in total). Whiskers indicate 1.5 × IQR. Mean values are shown indicated by the dashed red line. cGVHD: chronic graft-versus-host disease; r: rater.
Figure 5
Figure 5
Example demarcations from the crowd (blue) versus ground truth (green). (A) Consistent demarcation of highly affected areas by crowds assembled from both the high and low feedback groups, but both missed areas of subtle surface changes. (B) The high feedback crowd failed to identify abnormal changes while low feedback top 5 crowd identified 75% of abnormal skin areas. (C) High variability between images of the same skin region viewed from different angles by the low feedback top 5 crowd. cGVHD: chronic graft-versus-host disease; r: rater.
Figure 6
Figure 6
Per-photo surface area error for the low feedback group (top 5 raters). 3D photo IDs are ordered by decreasing median error. The shaded area shows the range of error between 2D projections for each 3D photo.
Figure 7
Figure 7
Surface area error of the majority vote mask versus the SD of surface area estimates for each photo. Slope and coefficient of determination (R2) for the linear regression fit (red dashed line) are also given.
Figure 8
Figure 8
Surface area error of individual raters in successive groups of affected images. Error is displayed for all 37 raters who marked at least 100 affected images. Each point is the mean error for a given user for the group of 20 photos in the stated image range.

Similar articles

References

    1. Socié G, Ritz J. Current issues in chronic graft-versus-host disease. Blood. 2014 Jul 17;124(3):374–384. doi: 10.1182/blood-2014-01-514752. https://linkinghub.elsevier.com/retrieve/pii/S0006-4971(20)39969-9 S0006-4971(20)39969-9 - DOI - PMC - PubMed
    1. Rodgers CJ, Burge S, Scarisbrick J, Peniket A. More than skin deep? Emerging therapies for chronic cutaneous GVHD. Bone Marrow Transplant. 2013 Mar;48(3):323–337. doi: 10.1038/bmt.2012.96.bmt201296 - DOI - PubMed
    1. Jagasia MH, Greinix HT, Arora M, Williams KM, Wolff D, Cowen EW, Palmer J, Weisdorf D, Treister NS, Cheng G, Kerr H, Stratton P, Duarte RF, McDonald GB, Inamoto Y, Vigorito A, Arai S, Datiles MB, Jacobsohn D, Heller T, Kitko CL, Mitchell SA, Martin PJ, Shulman H, Wu RS, Cutler CS, Vogelsang GB, Lee SJ, Pavletic SZ, Flowers MED. National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: I. The 2014 Diagnosis and Staging Working Group report. Biol Blood Marrow Transplant. 2015 Mar;21(3):389–401.e1. doi: 10.1016/j.bbmt.2014.12.001. https://linkinghub.elsevier.com/retrieve/pii/S1083-8791(14)01378-0 S1083-8791(14)01378-0 - DOI - PMC - PubMed
    1. Lee SJ, Wolff D, Kitko C, Koreth J, Inamoto Y, Jagasia M, Pidala J, Olivieri A, Martin PJ, Przepiorka D, Pusic I, Dignan F, Mitchell SA, Lawitschka A, Jacobsohn D, Hall AM, Flowers MED, Schultz KR, Vogelsang G, Pavletic S. Measuring therapeutic response in chronic graft-versus-host disease. National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: IV. The 2014 Response Criteria Working Group report. Biol Blood Marrow Transplant. 2015 Jun;21(6):984–999. doi: 10.1016/j.bbmt.2015.02.025. https://linkinghub.elsevier.com/retrieve/pii/S1083-8791(15)00155-X S1083-8791(15)00155-X - DOI - PMC - PubMed
    1. Miklos D, Cutler CS, Arora M, Waller EK, Jagasia M, Pusic I, Flowers ME, Logan AC, Nakamura R, Blazar BR, Li Y, Chang S, Lal I, Dubovsky J, James DF, Styles L, Jaglowski S. Ibrutinib for chronic graft-versus-host disease after failure of prior therapy. Blood. 2017 Nov 23;130(21):2243–2250. doi: 10.1182/blood-2017-07-793786. https://linkinghub.elsevier.com/retrieve/pii/S0006-4971(20)32689-6 S0006-4971(20)32689-6 - DOI - PMC - PubMed

LinkOut - more resources