Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 23:8:1618607.
doi: 10.3389/frai.2025.1618607. eCollection 2025.

AI-assisted anatomical structure recognition and segmentation via mamba-transformer architecture in abdominal ultrasound images

Affiliations

AI-assisted anatomical structure recognition and segmentation via mamba-transformer architecture in abdominal ultrasound images

Shih-Fang Chang et al. Front Artif Intell. .

Abstract

Background: Abdominal ultrasonography is a primary diagnostic tool for evaluating medical conditions within the abdominal cavity. Accurate determination of the relative locations of intra-abdominal organs and lesions based on anatomical features in ultrasound images is essential in diagnostic sonography. Recognizing and extracting anatomical landmarks facilitates lesion evaluation and enhances diagnostic interpretation. Recent artificial intelligence (AI) segmentation methods employing deep neural networks (DNNs) and transformers encounter computational efficiency challenges to balance the preservation of feature dependencies information with model efficiency, limiting their clinical applicability.

Methods: The anatomical structure recognition framework, MaskHybrid, was developed using a private dataset comprising 34,711 abdominal ultrasound images of 2,063 patients from CSMUH. The dataset included abdominal organs and vascular structures (hepatic vein, inferior vena cava, portal vein, gallbladder, kidney, pancreas, spleen) and liver lesions (hepatic cyst, tumor). MaskHybrid adopted a mamba-transformer hybrid architecture consisting of an evolved backbone network, encoder, and corresponding decoder to capture long-range spatial dependencies and contextual information effectively, demonstrating improved image segmentation capabilities in visual tasks while mitigating the computational burden associated with the transformer-based attention mechanism.

Results: Experiments on the retrospective dataset achieved a mean average precision (mAP) score of 74.13% for anatomical landmarks segmentation in abdominal ultrasound images. Our proposed framework outperformed baselines across most organ and lesion types and effectively segmented challenging anatomical structures. Moreover, MaskHybrid exhibited a significantly shorter inference time (0.120 ± 0.013 s), achieving 2.5 times faster than large-sized AI models of similar size. Combining Mamba and transformer architectures, this hybrid design was well-suited for the timely analysis of complex anatomical structures segmentation in abdominal ultrasonography, where accuracy and efficiency are critical in clinical practice.

Conclusion: The proposed mamba-transformer hybrid recognition framework simultaneously detects and segments multiple abdominal organs and lesions in ultrasound images, achieving superior segmentation accuracy, visualization effect, and inference efficiency, thereby facilitating improved medical image interpretation and near real-time diagnostic sonography that meets clinical needs.

Keywords: abdominal ultrasound; anatomical structure; artificial intelligence; deep learning; image segmentation; sonography; state space models; transformer.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
(A) Line segments and (B) polygon-based contouring foregrounds were created with our own interactive labeling mechanism. Intersecting line segments annotated by physicians are used to indicate potential organ regions, and the GrabCut segmentation algorithm is then used to generate the ground truth automatically.
Figure 2
Figure 2
Number of patients and ultrasound images, and the distribution of anatomical structure annotations in the private dataset.
Figure 3
Figure 3
Architecture and components of the proposed framework, MaskHybrid, are based on MaskDINO and further extended (gray-shaded area) to accelerate segmentation accuracy and inference efficiency.
Figure 4
Figure 4
Visualization comparison of baselines and our anatomical recognition model. MaskHybrid achieved the closest visualization effect to the ground truth regarding both annotation type and the number of recognized structures. (A) MaskDINO baselines missed the hepatic vein or the erroneous identification of the tumor. (B) MaskDINO baselines missed the portal vein.
Figure 5
Figure 5
Visualization comparison of MaskHybrid models with and without the hybrid encoder. Both (A) and (B) are segmentations of hepatic veins and tumors. MaskHybrid with the hybrid encoder correctly identified the hepatic vein in (A) and showed a more comprehensive tumor distribution in (B).
Figure 6
Figure 6
Recognition of unannotated anatomical structures from ground truth. Despite training data limitations, MaskHybrid still effectively identifies anatomical structures in the segmentation of the missing inferior vena cava in (A), the missing hepatic veins in (B), and the missing hepatic veins in (C).

Similar articles

References

    1. Biswas M., Kuppili V., Edla D. R., Suri H. S., Saba L., Marinhoe R. T., et al. (2018). Symtosis: a liver ultrasound tissue characterization and risk stratification in optimized deep learning paradigm. Comput. Methods Prog. Biomed. 155, 165–177. doi: 10.1016/j.cmpb.2017.12.016, PMID: - DOI - PubMed
    1. Boesch G.. (2024). YOLO explained: From v1 to v11. Available online at: viso.ai. https://viso.ai/computer-vision/yolo-explained/ (Accessed April 20, 2025).
    1. Cai L., Pfob A. (2025). Artificial intelligence in abdominal and pelvic ultrasound imaging: current applications. Abdom. Radiol. 50, 1775–1789. doi: 10.1007/s00261-024-04640-x, PMID: - DOI - PMC - PubMed
    1. Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S. (2020). “End-to-end object detection with transformers.” In European conference on computer vision–ECCV 2020. Lecture Notes in Computer Science 12346, 213–229.
    1. Chen T., Tu S., Wang H., Liu X., Li F., Jin W., et al. (2020). Computer-aided diagnosis of gallbladder polyps based on high resolution ultrasonography. Comput. Methods Prog. Biomed. 185:105118. doi: 10.1016/j.cmpb.2019.105118, PMID: - DOI - PubMed

LinkOut - more resources