Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct:89:102918.
doi: 10.1016/j.media.2023.102918. Epub 2023 Aug 2.

Segment anything model for medical image analysis: An experimental study

Affiliations

Segment anything model for medical image analysis: An experimental study

Maciej A Mazurowski et al. Med Image Anal. 2023 Oct.

Abstract

Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to segment user-defined objects of interest in an interactive manner. While the model performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point and box prompts for SAM using a standard method that simulates interactive segmentation. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity such as the segmentation of organs in computed tomography and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it. Code for evaluation SAM is made publicly available at https://github.com/mazurowski-lab/segment-anything-medical-evaluation.

Keywords: Deep learning; Foundation models; Segmentation.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1.
Fig. 1.
Examples of prompt(s) generated by the five modes respectively. Green contours show the ground-truth masks, and blue star(s) and box(es) indicate the prompts.
Fig. 2.
Fig. 2.
Performance of SAM under 5 modes of use. Left: Performance of SAM across 28 segmentation tasks, with results ranked in descending order based on Mode 4. Oracle performance for each mode is indicated by the inverted triangle. Right: A summarized performance comparison of all five modes across all tasks, presented in a box and whisker plot format.
Fig. 3.
Fig. 3.
Visualization of SAM’s segmentation results in two different modes. Each dataset is shown in two sequential rows, with its name along the left side. For each dataset, it displays three examples from left to right, reflecting the 25th, 50th, and 75th percentiles of IoU across all images for that dataset. For each example, we visualize (top left) the raw image; (bottom left) the zoom-in image with the area of interest; (top right) the segmented results for mode 2: 1 point at each object region; (bottom right) the segmented results for mode 4: 1 box region at each object region. Additionally, the IoU is represented above each segmented result. Examples of all the datasets are shown in Appendix Figure 1–5.
Fig. 4.
Fig. 4.
Comparison of SAM with three other competing methods, namely RITM, SimpleClick, and Focalclick, under the 1-point prompt setting. The results are presented in the form of the difference between SAM and other methods (Δ IoU), and ranked based on the descending order of the largest Δ IoU for each task.
Fig. 5.
Fig. 5.
Comparison of SAM and other methods under an interactive prompt setting. (Left) it presents the average performance of SAM and other methods across all tasks with respect to the number of prompt changes. (Right) it shows the detailed performance of SAM over each task.
Fig. 6.
Fig. 6.
Examples of SAM’s prediction under the interactive prompt setting. For each dataset, we display the results from 1-point prompts to 9-point prompts, respectively. The positive prompts are represented as green stars, and the negative prompts are represented as red stars.
Fig. 7.
Fig. 7.
Visualizations of examples with ambiguity based on SAM; the 1st, 2nd, and 3rd confident predictions are shown sequentially.
Fig. 8.
Fig. 8.
Performance of SAM when prompts are placed randomly within certain regions.
Fig. 9.
Fig. 9.
(Top) the relative size of the object in each dataset; (Bottom) the object size vs. detection performance for mode 2 and mode 4 separately; we also show a regression fitted curve each.
Fig. 10.
Fig. 10.
Examples of segment everything mode. For each example, we sampled a different number of grid points at each side as 25,26 and 27.

References

    1. Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A, 2020. Dataset of breast ultrasound images. Data in brief 28, 104863. - PMC - PubMed
    1. Anna M, Hasnin, kaggle446, shirzad, Will C., yffud, 2016. Ultrasound nerve segmentation. URL: https://kaggle.com/competitions/ultrasound-nerve-segmentation.
    1. Anwar SM, Majid M, Qayyum A, Awais M, Alnowami M, Khan MK, 2018. Medical image analysis using convolutional neural networks: a review. Journal of medical systems 42, 1–13. - PubMed
    1. Bilic P, Christ P, Li HB, Vorontsov E, Ben-Cohen A, Kaissis G, Szeskin A, Jacobs C, Mamani GEH, Chartrand G, et al., 2023. The liver tumor segmentation benchmark (lits). Medical Image Analysis 84, 102680. Bradski G., 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Tools - PMC - PubMed
    1. Chen J, Geng Y, Chen Z, Horrocks I, Pan JZ, Chen H, 2021. Knowledge-aware zero-shot learning: Survey and perspective, in: International Joint Conference on Artificial Intelligence.

Publication types

Substances

LinkOut - more resources