Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2011 Nov;6(6):749-67.
doi: 10.1007/s11548-011-0553-9. Epub 2011 Mar 30.

Needs assessment for next generation computer-aided mammography reference image databases and evaluation studies

Affiliations
Review

Needs assessment for next generation computer-aided mammography reference image databases and evaluation studies

Alexander Horsch et al. Int J Comput Assist Radiol Surg. 2011 Nov.

Abstract

Introduction: Breast cancer is globally a major threat for women's health. Screening and adequate follow-up can significantly reduce the mortality from breast cancer. Human second reading of screening mammograms can increase breast cancer detection rates, whereas this has not been proven for current computer-aided detection systems as "second reader". Critical factors include the detection accuracy of the systems and the screening experience and training of the radiologist with the system. When assessing the performance of systems and system components, the choice of evaluation methods is particularly critical. Core assets herein are reference image databases and statistical methods.

Methods: We have analyzed characteristics and usage of the currently largest publicly available mammography database, the Digital Database for Screening Mammography (DDSM) from the University of South Florida, in literature indexed in Medline, IEEE Xplore, SpringerLink, and SPIE, with respect to type of computer-aided diagnosis (CAD) (detection, CADe, or diagnostics, CADx), selection of database subsets, choice of evaluation method, and quality of descriptions.

Results: 59 publications presenting 106 evaluation studies met our selection criteria. In 54 studies (50.9%), the selection of test items (cases, images, regions of interest) extracted from the DDSM was not reproducible. Only 2 CADx studies, not any CADe studies, used the entire DDSM. The number of test items varies from 100 to 6000. Different statistical evaluation methods are chosen. Most common are train/test (34.9% of the studies), leave-one-out (23.6%), and N-fold cross-validation (18.9%). Database-related terminology tends to be imprecise or ambiguous, especially regarding the term "case".

Discussion: Overall, both the use of the DDSM as data source for evaluation of mammography CAD systems, and the application of statistical evaluation methods were found highly diverse. Results reported from different studies are therefore hardly comparable. Drawbacks of the DDSM (e.g. varying quality of lesion annotations) may contribute to the reasons. But larger bias seems to be caused by authors' own decisions upon study design. RECOMMENDATIONS/CONCLUSION: For future evaluation studies, we derive a set of 13 recommendations concerning the construction and usage of a test database, as well as the application of statistical evaluation methods.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Eur J Cancer. 2008 Apr;44(6):798-807 - PubMed
    1. N Engl J Med. 2007 Apr 5;356(14):1399-409 - PubMed
    1. Radiology. 2006 Oct;241(1):47-53 - PubMed
    1. J Natl Cancer Inst. 2004 Aug 18;96(16):1260-1; author reply 1261 - PubMed
    1. J Natl Cancer Inst. 2004 Feb 4;96(3):185-90 - PubMed

Publication types

LinkOut - more resources