Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug 7;15(1):199.
doi: 10.1186/s13244-024-01774-w.

RECIST 1.1 assessments variability: a systematic pictorial review of blinded double reads

Affiliations
Review

RECIST 1.1 assessments variability: a systematic pictorial review of blinded double reads

Antoine Iannessi et al. Insights Imaging. .

Abstract

Reader variability is intrinsic to radiologic oncology assessments, necessitating measures to enhance consistency and accuracy. RECIST 1.1 criteria play a crucial role in mitigating this variability by standardizing evaluations, aiming to establish an accepted "truth" confirmed by histology or patient survival. Clinical trials utilize Blind Independent Centralized Review (BICR) techniques to manage variability, employing double reads and adjudicators to address inter-observer discordance effectively. It is essential to dissect the root causes of variability in response assessments, with a specific focus on the factors influencing RECIST evaluations. We propose proactive measures for radiologists to address variability sources such as radiologist expertise, image quality, and accessibility of contextual information, which significantly impact interpretation and assessment precision. Adherence to standardization and RECIST guidelines is pivotal in diminishing variability and ensuring uniform results across studies. Variability factors, including lesion selection, new lesion appearance, and confirmation bias, can have profound implications on assessment accuracy and interpretation, underscoring the importance of identifying and addressing these factors. Delving into the causes of variability aids in enhancing the accuracy and consistency of response assessments in oncology, underscoring the role of standardized evaluation protocols and mitigating risk factors that contribute to variability. Access to contextual information is crucial. CRITICAL RELEVANCE STATEMENT: By understanding the causes of diagnosis variability, we can enhance the accuracy and consistency of response assessments in oncology, ultimately improving patient care and clinical outcomes. KEY POINTS: Baseline lesion selection and detection of new lesions play a major role in the occurrence of discordance. Image interpretation is influenced by contextual information, the lack of which can lead to diagnostic uncertainty. Radiologists must be trained in RECIST criteria to reduce errors and variability.

Keywords: Diagnostic errors; Oncology; Quality improvement; RECIST 1.1; Statistics & numerical data.

PubMed Disclaimer

Conflict of interest statement

Antoine Iannessi & Hubert Beaumont are part-time employees of Median Technologies. Christine Ojango & Yan Liu are full-time employees of Median Technologies. Anne-Sophie Bertrand declares having no competing interests.

Figures

Fig. 1
Fig. 1
Acceptable reader variability of baseline selection considering RECIST 1.1 guidelines. The purpose of the baseline selection guideline is to standardize a fashion to best represent the disease by selecting the maximum and largest lesions (a). However, for practical use, the RECIST 1.1 criteria include room for variability in baseline assessment by limiting the number of target lesions (in red) to 2 per organ and 5 in total. Then, for patients with multiple lesions, there is a compliant inter-reader variability in the assessment (b, f, g). Besides the maximum number of lesions a justifiable reason such as equivocality of the finding, robustness of the measurement, or information about previously irradiated lesion must explain the variability but at the same time challenge the representativity goal of the selection task (c, d, e)
Fig. 2
Fig. 2
Baseline discrepancy caused by a lack of training. The lack of knowledge is not a frequent provider of variability between readers however often it results in errors of guidance when the reader does not master the criteria and selects bone blastic lesions as measurable lesions (a) or cavitary lesions while some more robust lesions for measurement is available (b). Rarely, the selection wrongly includes an unequivocally benign lesion due to a radiologist’s lack of knowledge such as the adrenal myelolipoma containing fat (c)
Fig. 3
Fig. 3
Baseline discrepancy caused by equivocality on the disease malignancy or the finding class. Equivocality at baseline is frequent due to the lack of previous examination or censored information for a blind evaluation. In such conditions, small lesions, e.g., adrenal nodules, juxta centimetric lymph node, and lung micronodules, might be wrongly considered as disease-related findings at baseline while they will remain stable over time and prove their benign characteristic (a, b, c). In some cases, the reader might select the same target but with a different organ classification (d, e). For RECIST 1.1, depending on how the same lesion is considered to be a lymph node or a mass, the specific method of measurement and threshold of the nodal lesion can impact the overall assessment
Fig. 4
Fig. 4
False alarm for progression. From left to right, to define a new lesion, the finding should be new and not present in the baseline (a), considered pathologic so ≥ 1 cm for lymph nodes (b), unequivocally related to the cancer disease spreading unlike a micronodule possibly linked to intercurrent inflammation (c), not related to a blastic bone healing phenomenon (d) or a treatment-induced adverse event such as osteonecrosis of the jaw (e)
Fig. 5
Fig. 5
False Negative for Progression. The most frequent metastatic sites for lung cancer depend on the primary site, such as the liver and adrenal gland (a, b). While these lesions can be missed, more often, new lesions are missed when they are located in infrequent locations like the bone due to their low conspicuity in axial view (c) compared to sagittal reformats (d). Infrequent locations, such as the brain or soft tissue, are unexpected and more susceptible to attention bias (e, f)
Fig. 6
Fig. 6
Delayed detection. The two radiologists identified the same new lesion (a, b) but radiologist 2 detected it one visit later leading to a 6 weeks discrepant date of progression
Fig. 7
Fig. 7
Inter-observer measurement variability. At baseline, the two radiologists selected the same targets as mediastinal node (a, b) and lung nodule (c, d) with variable measurement at the lesion level but compensated when summing the index lesions (identical sum of diameters). Then, during the follow-up, the measurement variability leads to a patient response discrepancy with progressive disease (20% increase) assessed by reader 2 versus non-progressive disease for reader 1 while the difference of the sum of diameter at the visit 1 is 10 mm and at visit 2 is 7 mm
Fig. 8
Fig. 8
Measurement variability factors (inter and intra-observers). The discrepancy in measurements can lie inside the finding such as cavitation or speculation and difficulty of measurement (a, b, f, g), the method of measurement such as rim included or not in the measurement (c, h), the selection of a different phase of injection even for a same reader during follow-up (d, e)
Fig. 9
Fig. 9
Anchor bias in measurement. The radiologist chose 2 targets in the liver and his assessment concluded to be a stable disease after 9 evaluations (a, b). However, with a retrospective look, one can visually observe a significant increase in the same target measured (c). We suspect a confirmation bias in the measurement workflow timepoint after timepoint
Fig. 10
Fig. 10
Discrepancy on remaining visible findings after treatment. The same lung nodule target was chosen by two radiologists (a, b). Radiologist 1 considers the remaining visible finding as a scar while radiologist 2 continues to measure the remaining disease (18 mm) preventing a complete response for this patient. The same adrenal gland finding was chosen as TL by reader 1 and NTL by reader 2 (c, d). Unlike reader 1, reader 2 considers that the lesion disappears with remaining calcified scars although it is still visible and measured by reader 2
Fig. 11
Fig. 11
Paradoxical response of non-targeted disease. The measured disease is stable (double arrowhead) while the non-targeted disease (circle) increases unequivocally. A small new lesion can also be identified (arrow)
Fig. 12
Fig. 12
Image quality-related discrepancy. One lung para-mediastinal target is measured with mediastinal windowing and good visualization of adjacent vascular structures (a, c). During the follow-up, one CT evaluation is performed without contrast injection (intercurrent contra-indication). The higher inter-reader variability of measurement due to the lack of contrast in the image resulted in a progression discrepancy (b, d)

References

    1. (FDA) FaDA (2018) Clinical trial imaging endpoint process standards guidance for industry. In: Research CfDEaRCfBEa (ed.). FDA. 26 Apr 2018
    1. Ellingson BM, Brown MS, Boxerman JL et al (2021) Radiographic read paradigms and the roles of the central imaging laboratory in neuro-oncology clinical trials. Neuro Oncol 23:189–198 10.1093/neuonc/noaa253 - DOI - PMC - PubMed
    1. Ford R, O’Neal M, Moskowitz S, Fraunberger J (2016) Adjudication rates between readers in Blinded Independent Central Review of Oncology Studies. J Clin Trials 6:289
    1. Schmid AM, Raunig DL, Miller CG et al (2021) Radiologists and clinical trials: Part 1 the truth about reader disagreements. Ther Innov Regul Sci 55:1111–1121 10.1007/s43441-021-00316-6 - DOI - PMC - PubMed
    1. Abramson RG, McGhee CR, Lakomkin N, Arteaga CL (2015) Pitfalls in RECIST Data extraction for clinical trials: beyond the basics. Acad Radiol 22:779–786 10.1016/j.acra.2015.01.015 - DOI - PMC - PubMed

LinkOut - more resources