Colorectal cancer unmasked: A synergistic AI framework for Hyper-granular image dissection, precision segmentation, and automated diagnosis
- PMID: 40665235
- PMCID: PMC12265324
- DOI: 10.1186/s12880-025-01826-7
Colorectal cancer unmasked: A synergistic AI framework for Hyper-granular image dissection, precision segmentation, and automated diagnosis
Abstract
Colorectal cancer (CRC) is the second most common cause of cancer-related mortality worldwide, underscoring the necessity for computer-aided diagnosis (CADx) systems that are interpretable, accurate, and robust. This study presents a practical CADx system that combines Vision Transformers (ViTs) and DeepLabV3 + to accurately identify and segment colorectal lesions in colonoscopy images.The system addresses class balance and real-world complexity with PCA-based dimensionality reduction, data augmentation, and strategic preprocessing using recently curated CKHK-22 dataset comprising more than 14,000 annotated images of CVC-ClinicDB, Kvasir-2, and Hyper-Kvasir. ViT, ResNet-50, DenseNet-201, and VGG-16 were used to quantify classification performance. ViT achieved best-in-class accuracy (97%), F1-score (0.95), and AUC (92%) in test data. The DeepLabV3 + achieved segmentation state-of-the-art for tasks of localisation with 0.88 Dice Coefficient and 0.71 Intersection over Union (IoU), ensuring sharp delineation of areas that are malignant. The CADx system accommodates real-time inference and served through Google Cloud for information that accommodates scalable clinical implementation. The image-level segmentation effectiveness is evidenced by comparison with visual overlay and expert-manually deliminated masks, and its precision is illustrated by computation of precision, recall, F1-score, and AUC. The hybrid strategy not only outperforms traditional CNN strategies but also overcomes important clinical needs such as detection early, balance of highly disparate classes, and clear explanation. The proposed ViT-DeepLabV3 + system establishes a basis for advanced AI support to colorectal diagnosis by utilizing self-attention strategies and learning with different scales of context. The system offers a high-capacity, reproducible computerised colorectal cancer screening and monitoring solution and can be best deployed where resources are scarce, and it can be highly desirable for clinical deployment.
Keywords: Automatic segmentation; DeepLabV3+; Diagnosis of colorectal cancer; Hyper-granular image analysis; Vision transformers.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: Not applicable. This study utilized publicly available colonoscopy image datasets (CVC-ClinicDB, Kvasir-SEG, and Hyper-Kvasir), which were merged to form the CKHK-22 dataset. These datasets are anonymized and do not involve any direct interaction with human subjects or use of live human data. Therefore, ethical approval and participant consent were not required. Conflicts of interest: Authors stated that no conflict of Interest. Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Figures

























References
-
- Cancer.net, [Online]. Available: https://www.cancer.net/cancer-types/colorectal-cancer/diagnosis. [19 November 2022].
-
- Cancer statistics of Japan, Japan National Cancer Center:31 May 2021. [Online]. Available: https://ganjoho.jp/public/qa_links/report/statistics/2021_en.html. [26 July 2024].
MeSH terms
LinkOut - more resources
Full Text Sources
Medical