Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun;25(3):423-36.
doi: 10.1007/s10278-011-9445-3.

Consensus versus disagreement in imaging research: a case study using the LIDC database

Affiliations

Consensus versus disagreement in imaging research: a case study using the LIDC database

Dmitriy Zinovev et al. J Digit Imaging. 2012 Jun.

Abstract

Traditionally, image studies evaluating the effectiveness of computer-aided diagnosis (CAD) use a single label from a medical expert compared with a single label produced by CAD. The purpose of this research is to present a CAD system based on Belief Decision Tree classification algorithm, capable of learning from probabilistic input (based on intra-reader variability) and providing probabilistic output. We compared our approach against a traditional decision tree approach with respect to a traditional performance metric (accuracy) and a probabilistic one (area under the distance-threshold curve-AuC(dt)). The probabilistic classification technique showed notable performance improvement in comparison with the traditional one with respect to both evaluation metrics. Specifically, when applying cross-validation technique on the training subset of instances, boosts of 28.26% and 30.28% were noted for the probabilistic approach with respect to accuracy and AuC(dt), respectively. Furthermore, on the validation subset of instances, boosts of 20.64% and 23.21% were noted again for the probabilistic approach with respect to the same two metrics. In addition, we compared our CAD system results with diagnostic data available for a small subset of the Lung Image Database Consortium database. We discovered that when our CAD system errs, it generally does so with low confidence. Predictions produced by the system also agree with diagnoses of truly benign nodules more often than radiologists, offering the possibility of reducing the false positives.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The probabilistic multi-class space in which one nodule was interpreted by four radiologists: two assigned rating 2, one rating 3, and the fourth one label 5 (white points). The dark points represent the predicted probabilistic ratings for the same nodule. Explored area represents those cases that take into account agreement/consensus when predicting malignancy; the gray area represents those cases for which the nodules are not clearly benign or malignant
Fig. 2
Fig. 2
Visual representation of the LIDC data structure; one nodule is exemplified through the differences in the nodule’s outlines and malignancy ratings
Fig. 3
Fig. 3
Upper 3 levels of belief decision tree constructed for malignancy semantic characteristic
Fig. 4
Fig. 4
Sample distance–threshold curve for training dataset (first iteration of cross-validation). BDT stands for belief decision trees, TDT stands for traditional decision trees
Fig. 5
Fig. 5
Nodules found in both the LIDC dataset and the diagnosis dataset which correspond to the analysis of Tables 5, 6, and 7. Each row shows the nodules on which there was disagreement between radiologists, computer, and diagnosis as identified in Tables 5, 6, and 7
Fig. 6
Fig. 6
Probabilistic interpretation of eight malignant nodules by the computer and radiologists; dark bars denote the computer-based ratings and the light ones represent the radiologists’ ratings; the y-axis represents the probability and the x-axis represents the malignancy class
Fig. 7
Fig. 7
Probabilistic interpretation of nine benign and one indeterminate nodules by the computer and radiologists; dark bars denote the computer-based ratings and the light ones represent the radiologists’ ratings; the y-axis represents the probability and the x-axis represents the malignancy class

Similar articles

Cited by

References

    1. Bankier AA, Levin D, Halpern EF, Kressel HY. Consensus interpretation in imaging research: is there a better way? Radiology. 2010;257:14–17. doi: 10.1148/radiol.10100252. - DOI - PubMed
    1. Mower WR. Evaluating bias and variability in diagnostic test reports. Ann Emerg Med. 1999;33(1):85–91. doi: 10.1016/S0196-0644(99)70422-1. - DOI - PubMed
    1. Turner DA. Observer variability: what to do until perfect diagnostic tests are invented. J Nucl Med. 1978;19(4):435–437. - PubMed
    1. Jarvik JG, Deyo RA. Moderate versus mediocre: the reliability of spine MR data interpretations. Radiology. 2009;250(1):15–17. doi: 10.1148/radiol.2493081458. - DOI - PubMed
    1. Carrino JA, Lurie JD, Tosteson AN, et al. Lumbar spine: reliability of MR imaging findings. Radiology. 2009;250(1):161–170. doi: 10.1148/radiol.2493071999. - DOI - PMC - PubMed