Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2023 Feb;306(2):e220266.
doi: 10.1148/radiol.220266. Epub 2022 Oct 4.

Understanding Reader Variability: A 25-Radiologist Study on Liver Metastasis Detection at CT

Affiliations
Observational Study

Understanding Reader Variability: A 25-Radiologist Study on Liver Metastasis Detection at CT

Scott S Hsieh et al. Radiology. 2023 Feb.

Abstract

Background Substantial interreader variability exists for common tasks in CT imaging, such as detection of hepatic metastases. This variability can undermine patient care by leading to misdiagnosis. Purpose To determine the impact of interreader variability associated with (a) reader experience, (b) image navigation patterns (eg, eye movements, workstation interactions), and (c) eye gaze time at missed liver metastases on contrast-enhanced abdominal CT images. Materials and Methods In a single-center prospective observational trial at an academic institution between December 2020 and February 2021, readers were recruited to examine 40 contrast-enhanced abdominal CT studies (eight normal, 32 containing 91 liver metastases). Readers circumscribed hepatic metastases and reported confidence. The workstation tracked image navigation and eye movements. Performance was quantified by using the area under the jackknife alternative free-response receiver operator characteristic (JAFROC-1) curve and per-metastasis sensitivity and was associated with reader experience and image navigation variables. Differences in area under JAFROC curve were assessed with the Kruskal-Wallis test followed by the Dunn test, and effects of image navigation were assessed by using the Wilcoxon signed-rank test. Results Twenty-five readers (median age, 38 years; IQR, 31-45 years; 19 men) were recruited and included nine subspecialized abdominal radiologists, five nonabdominal staff radiologists, and 11 senior residents or fellows. Reader experience explained differences in area under the JAFROC curve, with abdominal radiologists demonstrating greater area under the JAFROC curve (mean, 0.77; 95% CI: 0.75, 0.79) than trainees (mean, 0.71; 95% CI: 0.69, 0.73) (P = .02) or nonabdominal subspecialists (mean, 0.69; 95% CI: 0.60, 0.78) (P = .03). Sensitivity was similar within the reader experience groups (P = .96). Image navigation variables that were associated with higher sensitivity included longer interpretation time (P = .003) and greater use of coronal images (P < .001). The eye gaze time was at least 0.5 and 2.0 seconds for 71% (266 of 377) and 40% (149 of 377) of missed metastases, respectively. Conclusion Abdominal radiologists demonstrated better discrimination for the detection of liver metastases on abdominal contrast-enhanced CT images. Missed metastases frequently received at least a brief eye gaze. Higher sensitivity was associated with longer interpretation time and greater use of liver display windows and coronal images. © RSNA, 2022 Online supplemental material is available for this article.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: S.S.H. No relevant relationships. D.A.C. No relevant relationships. A.I. No relevant relationships. H.G. No relevant relationships. P.S.P. No relevant relationships. M.P.J. No relevant relationships. S.L. No relevant relationships. L.Y. No relevant relationships. J.L.F. No relevant relationships. D.R.H. No relevant relationships. R.E.C. No relevant relationships. C.H.M. International Society of Computed Tomography board member. J.G.F. No relevant relationships.

Figures

None
Graphical abstract
(A) Graphic user interface of the workstation software. A metastasis
(arrow) has been circumscribed in the axial stack (left) using liver window
settings and can also be seen in the coronal stack (right). (B) Eye-tracking
data for an example metastasis for 24 of the 25 readers. Each of the 24
subpanels shows gaze in a cyan overlay for a reader, the confidence score of
the circumscription or a comment that the metastasis was not circumscribed,
and the duration of gaze near the metastasis. Five of 24 readers did not
circumscribe this metastasis, including one who gazed at this metastasis for
20 seconds. The central subpanel was replaced to show the metastasis itself
(arrow) without the overlay.
Figure 1:
(A) Graphic user interface of the workstation software. A metastasis (arrow) has been circumscribed in the axial stack (left) using liver window settings and can also be seen in the coronal stack (right). (B) Eye-tracking data for an example metastasis for 24 of the 25 readers. Each of the 24 subpanels shows gaze in a cyan overlay for a reader, the confidence score of the circumscription or a comment that the metastasis was not circumscribed, and the duration of gaze near the metastasis. Five of 24 readers did not circumscribe this metastasis, including one who gazed at this metastasis for 20 seconds. The central subpanel was replaced to show the metastasis itself (arrow) without the overlay.
Confidence scores for true- and false-positive circumscriptions by
reader experience. Missed lesions (false-negative findings) do not have a
confidence score and are not indicated on these histograms. Abdominal
subspecialists indicated greater mean confidence for true-positive markings
than did the other readers (P < .001), and trainees indicated less
mean confidence for false-positive markings than did the other readers (P
< .001). FP = false-positive, TP = true-positive.
Figure 2:
Confidence scores for true- and false-positive circumscriptions by reader experience. Missed lesions (false-negative findings) do not have a confidence score and are not indicated on these histograms. Abdominal subspecialists indicated greater mean confidence for true-positive markings than did the other readers (P < .001), and trainees indicated less mean confidence for false-positive markings than did the other readers (P < .001). FP = false-positive, TP = true-positive.
Graph shows longer interpretation time is associated with higher
sensitivity. abd = abdominal.
Figure 3:
Graph shows longer interpretation time is associated with higher sensitivity. abd = abdominal.
Graphs show gaze time distributions. Frequency is normalized so that
the sum of all bars is 100%. Insets in B and C show a modified x-axis range
to capture gaze times longer than 10 seconds. All histograms use bar widths
of 0.25 second, except for the insets, which use 2 seconds. (A) Gaze for a
stereologic grid of points in the liver, indicating that most of the liver
was examined, at least briefly. (B) Gaze time for missed metastases
(false-negative findings). Inset shows a modified x-axis range to capture
gaze times longer than 10 seconds. (C) Gaze time for detected metastases
(true-positive findings).
Figure 4:
Graphs show gaze time distributions. Frequency is normalized so that the sum of all bars is 100%. Insets in B and C show a modified x-axis range to capture gaze times longer than 10 seconds. All histograms use bar widths of 0.25 second, except for the insets, which use 2 seconds. (A) Gaze for a stereologic grid of points in the liver, indicating that most of the liver was examined, at least briefly. (B) Gaze time for missed metastases (false-negative findings). Inset shows a modified x-axis range to capture gaze times longer than 10 seconds. (C) Gaze time for detected metastases (true-positive findings).
Detection rates and confidence scores for each metastasis. Each column
represents a different metastasis (n = 91). Dark colors indicate missed
metastases (false-negative findings), and different shades correspond to the
eye gaze time. Tan shades indicate confidence for circumscribed metastases
(1 = low confidence, 100 = high confidence). Metastases are sorted according
to the number of false-negative errors. (A–E) Five selected
metastases are marked in the plot and are shown for illustrative purposes;
arrows indicate metastasis. Eye gaze histograms are shown below the images.
An eye gaze longer than 20 seconds was placed into the 20-second bin. (A)
Metastasis was frequently missed (23 of 25 readers) and was associated with
short gaze times, implying visual search errors. (B) Metastasis also was
frequently missed (21 of 25 readers) and was associated with longer gaze
times, implying classification errors. (C) Metastasis was missed by only
five readers, usually with long gaze times or when circumscribed readers
indicated low confidence. (D) Metastasis was missed by three readers, all
with short gaze times. (E) Metastasis was circumscribed by all but one
reader. CS = confidence score.
Figure 5:
Detection rates and confidence scores for each metastasis. Each column represents a different metastasis (n = 91). Dark colors indicate missed metastases (false-negative findings), and different shades correspond to the eye gaze time. Tan shades indicate confidence for circumscribed metastases (1 = low confidence, 100 = high confidence). Metastases are sorted according to the number of false-negative errors. (A–E) Five selected metastases are marked in the plot and are shown for illustrative purposes; arrows indicate metastasis. Eye gaze histograms are shown below the images. An eye gaze longer than 20 seconds was placed into the 20-second bin. (A) Metastasis was frequently missed (23 of 25 readers) and was associated with short gaze times, implying visual search errors. (B) Metastasis also was frequently missed (21 of 25 readers) and was associated with longer gaze times, implying classification errors. (C) Metastasis was missed by only five readers, usually with long gaze times or when circumscribed readers indicated low confidence. (D) Metastasis was missed by three readers, all with short gaze times. (E) Metastasis was circumscribed by all but one reader. CS = confidence score.
Graphs show sensitivity as a function of gaze time in liver segments
grouped by location, along with a smoothed trend line. In most cases, longer
gaze time in segments indicated higher sensitivity in those segments. Linear
associations were significant for segments II and III (P = .002), V and VI
(P = .04), and VII and VIII (P = .02) but not for segments I and IV (P =
.27).
Figure 6:
Graphs show sensitivity as a function of gaze time in liver segments grouped by location, along with a smoothed trend line. In most cases, longer gaze time in segments indicated higher sensitivity in those segments. Linear associations were significant for segments II and III (P = .002), V and VI (P = .04), and VII and VIII (P = .02) but not for segments I and IV (P = .27).
Clustering of metastasis features and reader confidence. Top:
Clustergram of the reader confidence matrix. Columns correspond to
metastases (n = 91), rows correspond to readers (n = 25), and brightness
corresponds to reader confidence. Columns and rows are permuted to bring
clusters together, with phylogenetic trees on the top and left to show
empirically discovered relationships between similar metastases or readers.
Reader data are shown on the right (abd = abdominal subspecialist, non-abd =
nonabdominal subspecialist), including the jackknife alternative
free-response receiver operator characteristic curve score (JAF) and
sensitivity (SENS). Boxes A, B, and C show three areas of interest. Box A
encompasses a group of metastases that were found by nearly all readers (ie,
easy, nondiscriminatory). Box B encompasses a group of metastases that were
scored with lower confidence for five trainees and one nonabdominal
subspecialist. Box C encompasses a group of metastases that were challenging
to detect: approximately half of the readers were able to detect these
lesions, with no clear connection between reader experience and detection
rate. Bottom: Close-up images of six metastases randomly selected from each
of the corresponding boxes in the top panel.
Figure 7:
Clustering of metastasis features and reader confidence. Top: Clustergram of the reader confidence matrix. Columns correspond to metastases (n = 91), rows correspond to readers (n = 25), and brightness corresponds to reader confidence. Columns and rows are permuted to bring clusters together, with phylogenetic trees on the top and left to show empirically discovered relationships between similar metastases or readers. Reader data are shown on the right (abd = abdominal subspecialist, non-abd = nonabdominal subspecialist), including the jackknife alternative free-response receiver operator characteristic curve score (JAF) and sensitivity (SENS). Boxes A, B, and C show three areas of interest. Box A encompasses a group of metastases that were found by nearly all readers (ie, easy, nondiscriminatory). Box B encompasses a group of metastases that were scored with lower confidence for five trainees and one nonabdominal subspecialist. Box C encompasses a group of metastases that were challenging to detect: approximately half of the readers were able to detect these lesions, with no clear connection between reader experience and detection rate. Bottom: Close-up images of six metastases randomly selected from each of the corresponding boxes in the top panel.

References

    1. Fletcher JG , Fidler JL , Venkatesh SK , et al . Observer performance with varying radiation dose and reconstruction methods for detection of hepatic metastases . Radiology 2018. ; 289 ( 2 ): 455 – 464 . - PMC - PubMed
    1. Mileto A , Guimaraes LS , McCollough CH , Fletcher JG , Yu L . State of the art in abdominal CT: the limits of iterative reconstruction algorithms . Radiology 2019. ; 293 ( 3 ): 491 – 503 . - PubMed
    1. Patel AG , Pizzitola VJ , Johnson CD , Zhang N , Patel MD . Radiologists make more errors interpreting off-hours body CT studies during overnight assignments as compared with daytime assignments . Radiology 2020. ; 297 ( 2 ): 374 – 379 . - PubMed
    1. Ruutiainen AT , Durand DJ , Scanlon MH , Itri JN . Increased error rates in preliminary reports issued by radiology residents working more than 10 consecutive hours overnight . Acad Radiol 2013. ; 20 ( 3 ): 305 – 311 . - PubMed
    1. Branstetter BF 4th , Morgan MB , Nesbit CE , et al . Preliminary reports in the emergency department: is a subspecialist radiologist more accurate than a radiology resident? Acad Radiol 2007. ; 14 ( 2 ): 201 – 206 . - PubMed

Publication types