Are Vision-xLSTM-embedded U-Nets better at segmenting medical images?

Pallabi Dutta¹, Soham Bose², Swalpa Kumar Roy³, Sushmita Mitra⁴

Affiliations

¹ Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata, 700108, West Bengal, India. Electronic address: duttapallabi_r@isical.ac.in.
² Department of Computer Science and Engineering, Jadavpur University, 188, Raja Subodh Chandra Mallick Rd, Kolkata, 700032, West Bengal, India.
³ Department of Computer Science and Engineering, Alipurduar Government Engineering and Management College, Alipurduar, 736206, West Bengal, India.
⁴ Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata, 700108, West Bengal, India.

PMID: 40773779
DOI: 10.1016/j.neunet.2025.107925

Are Vision-xLSTM-embedded U-Nets better at segmenting medical images?

Pallabi Dutta et al. Neural Netw. 2025 Dec.

. 2025 Dec:192:107925.

doi: 10.1016/j.neunet.2025.107925. Epub 2025 Aug 5.

Authors

Pallabi Dutta¹, Soham Bose², Swalpa Kumar Roy³, Sushmita Mitra⁴

Affiliations

¹ Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata, 700108, West Bengal, India. Electronic address: duttapallabi_r@isical.ac.in.
² Department of Computer Science and Engineering, Jadavpur University, 188, Raja Subodh Chandra Mallick Rd, Kolkata, 700032, West Bengal, India.
³ Department of Computer Science and Engineering, Alipurduar Government Engineering and Management College, Alipurduar, 736206, West Bengal, India.
⁴ Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata, 700108, West Bengal, India.

PMID: 40773779
DOI: 10.1016/j.neunet.2025.107925

Abstract

The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers (ViTs). There is an increasing focus on developing architectures that are both high-performing and computationally efficient, capable of being deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. The objective of this research is to propose that Vision Extended Long Short-Term Memory (Vision-xLSTM) forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. This study investigates the integration of CNNs with Vision-xLSTM by introducing the novel U-VixLSTM. The Vision-xLSTM blocks capture the temporal and global relationships within the patches extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks to produce the segmentation output. The U-VixLSTM exhibits superior performance compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. The findings suggest that U-VixLSTM is a promising alternative to ViTs for medical image segmentation, delivering effective performance without substantial computational burden. This makes it feasible for deployment in healthcare environments with limited resources for faster diagnosis. Code provided: https://github.com/duttapallabi2907/U-VixLSTM.

Keywords: CNN; Medical image segmentation; Vision-LSTM.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Are Vision-xLSTM-embedded U-Nets better at segmenting medical images?

Affiliations

Are Vision-xLSTM-embedded U-Nets better at segmenting medical images?

Authors

Affiliations

Abstract

Conflict of interest statement

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous