Visual language model-assisted CT denoising via text-guided diffusion and fidelity maintenance
- PMID: 41816883
- DOI: 10.1177/08953996251372739
Visual language model-assisted CT denoising via text-guided diffusion and fidelity maintenance
Abstract
Reducing radiation dose in computed tomography (CT) and photon-counting CT (PCCT) is crucial for patient safety, but lower doses introduce noise that degrades image quality. Existing denoising methods often rely on supervised learning of paired data or are based on specific noise assumptions, which poses challenges in clinical practice. A novel Visual-Language Model-assisted CT Denoising (VLD) framework is proposed to address CT image noise while preserving diagnostic fidelity through semantic guidance. Our method innovatively leverages the human-level knowledge embedded in multimodal visual-language models and applies it to the field of CT image denoising. This approach enables the diffusion model to perform restoration guided by semantic understanding. Meanwhile, a tri-domain consistency framework has been proposed to further enhance image quality by progressively refining details while preserving structural integrity. Extensive experiments on both simulated CT and real PCCT data demonstrate that the VLD method generates high-quality reconstruction images and exhibits robust generalization to new scenarios. In simulation experiments, the VLD method achieves average improvements of 0.95 dB and 1.21 dB in peak signal-to-noise ratio under the 5000-photon number condition, outperforming the WGAN and FBPConvNet methods, which require paired data.
Keywords: computed tomography; diffusion models; image denoising; prompt engineering; visual language model.