Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;52(7):e17918.
doi: 10.1002/mp.17918.

Denoising pediatric cardiac photon-counting CT data with sparse coding and data-adaptive, self-supervised deep learning

Affiliations

Denoising pediatric cardiac photon-counting CT data with sparse coding and data-adaptive, self-supervised deep learning

Darin P Clark et al. Med Phys. 2025 Jul.

Abstract

Background: The judicious use of CT in pediatric cardiac applications is warranted because young patients face the need for repeated imaging and increased lifetime cancer risk after ionizing radiation exposure. The quality of pediatric cardiac CT scans is variable because of limited protocols optimizations for pediatric patients, the common presence of metallic implants following treatment, and disparities in denoising algorithm performance between adult and pediatric scans. Two recent technological developments promise to improve the average quality of pediatric CT scans at fixed or reduced dose: clinical photon-counting CT (PCCT) and deep learning (DL) algorithms for CT image denoising. Given advancements to accommodate variable image quality, these technologies will deliver improved spatial resolution, noise performance, and contrast resolution for pediatric cardiac CT imaging.

Purpose: To advance self-supervised DL denoising methods to accommodate variable image quality in pediatric cardiac CT data.

Methods: Starting with the popular Vision Transformer (ViT) DL architecture, two targeted architectural changes were made: (1) the multi-layer perceptrons (MLPs) were modified to allow cross-token recombination of encoded image data following attention computations (parallels patch-wise weighting and averaging in non-local means [NLM]), and (2) the network head was replaced with the equivalent of an overcomplete dictionary to perform dictionary sparse coding (SC). This modified, 3D ViT (mViT) was then trained in a dynamic fashion: the balance between data fidelity and representation sparsity was adjusted during training such that the average fidelity error remained consistent with localized estimates of image noise. To demonstrate the newly proposed method, the mViT was trained with pediatric cardiac photon-counting x-ray CT data with variable levels of image noise (NAEOTOM Alpha PCCT scanner; retrospective data from 20 patients scanned at Duke University; ages: 1-18 years; iterative reconstruction noise level in the left ventricle: 20-55 HU). Data from one patient with the highest levels of noise was reserved for validation. Testing data included Alpha data from three additional Duke patients (2 < 1 year old) and a murine cardiac PCCT data set acquired on a preclinical system.

Results: The validation denoising results demonstrate that SC with the mViT preserves anatomic structures relevant to the diagnosis and treatment of congenital heart defects (coronary artery origins; valve leaflets; left ventricle boundaries) while achieving similar intensity bias and lower intensity variance values than competing denoising methods (bilateral filtration [BF], NLM, dictionary SC, block matching 4D, orthogonal matching pursuit, Noise2Void). Applying the trained mViT network to preclinical PCCT demonstrated robust generalization performance to high levels of image noise (∼230 HU) and differing image contrast; however, applying the network to clinical PCCT data in younger patients (< 1 year old) demonstrated some smoothing of image details in data already heavily denoised during reconstruction.

Conclusions: This work demonstrates robust, self-supervised denoising of pediatric cardiac PCCT data through data adaptation during network training based on local noise estimates. The trained network generalizes to data sets with high levels of noise and differing image contrast relative to the training data, suggesting that self-supervised fine tuning may allow the trained network to address related CT denoising problems.

Keywords: deep learning; image denoising; x‐ray CT.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Statement:

Joseph Cao has received speaker honoraria from Siemens Healthineers.

Figures

Figure 1:
Figure 1:
Sparse coding network architecture. (A) Block diagram of mViT layers designed to perform sparse coding. (B) A series of eight residual transformer blocks convert tokenized and embedded image patches to non-local sparse codes. (C) Non-standard MLP which allows direct data processing within and between tokens. (D) Overcomplete dictionary block which converts non-local sparse codes to output volume patches. Data dimensions are labeled for each step of network evaluation: volume patches, (batch,x,y,z,channels); tokens, (batch,token#,embeddingdimension). Note: “Patch to Tokens” (A) reshapes and permutes the spatial dimensions (x,y,z) to subdivide the input patch into spatially coherent, non-overlapping tokens of 63 voxels each (inverse operation: “Patch to Tokens−1”, (D)).
Figure 2.
Figure 2.
SC-mViT training and validation metrics by training epoch. (A) RMSE in HU averaged over 4876 minibatches per training epoch (batch size: 32 3D patches). (B) Average absolute value of the sparse codes (α). (C) Average value of the dynamically adjusted regularization factor (λ). (D) Average training cost function value (Eq. 1) computed from (A), (B), and (C). (E) RMSE value in HU for the validation data set averaged over the entire 3D volume and computed at the end of each epoch.
Figure 3:
Figure 3:
Sparse coding denoising results for cardiac anatomy (SC-mViT; validation data; 11 years old, female). The original data (column 1) is compared with the denoised data (column 2). Residuals (column 1 – column 2) highlight where denoising results may compromise image structure (column 3).
Figure 4:
Figure 4:
Comparison of denoising methods. The SC-mViT denoising results are compared with the results of several reference denoising algorithms for an obliquely resliced view of the pulmonic valve leaflets. Residual maps show the difference between the input image (top, left) and each denoised image. Matching arrows and ovals highlight notable differences in performance.
Figure 5:
Figure 5:
The impact of denoising on noise power (A) and spatial resolution (B, C) as measured in the validation data set (11 years old, female).
Figure 6:
Figure 6:
Clinical PCCT testing data (17 years old, female). Axial (row 1) and coronal (row 2) slices are shown through the reconstruction before (column 1) and after (column 2) denoising with the SC-mViT network. Residual images (column 3) and yellow boxes highlight potential changes in image structure. Red boxes and text denote mean and standard deviation measurements in regions of interest.
Figure 7:
Figure 7:
Clinical PCCT testing data from two patients less than one year old. Clinical reconstructions are compared before (column 1) and after (column 2) denoising with the SC-mViT network. Residual images (column 3, differences across rows) and regions of interest (red boxes with mean ± standard deviation; yellow boxes) detail noise levels and denoising performance. Note differences in the display window by row to accommodate different reconstruction monoenergies.
Figure 8:
Figure 8:. Preclinical testing data.
(A) Unregularized reconstruction of preclinical cardiac photon-counting micro-CT data at ventricular diastole (iodine region of interest, red circle and text; mean ± standard deviation). (B) Regularized, single-channel iterative reconstruction performed with the MCR Toolkit. (C) Application of the trained SC-mViT model applied to (A). (D) Average of all cardiac phases, providing a higher dose reference image (~6x higher than (A); 190 mGy) where the scan is static over time (outside of the red circle).

References

    1. Willemink MJ, Persson M, Pourmorteza A, Pelc NJ, Fleischmann D. Photon-counting CT: technical principles and clinical prospects. Radiology. 2018;289(2):293–312. - PubMed
    1. Cao J, Bache S, Schwartz FR, Frush D. Pediatric applications of photon-counting detector CT. American Journal of Roentgenology. 2023;220(4):580–589. - PubMed
    1. Dodge-Khatami J, Adebo DA. Evaluation of complex congenital heart disease in infants using low dose cardiac computed tomography. The International Journal of Cardiovascular Imaging. 2021;37:1455–1460. - PubMed
    1. Nelson BJ, Kc P, Badal A, Jiang L, Masters SC, Zeng R. Pediatric evaluations for deep learning CT denoising. Medical Physics. 2024;51(2):978–990. - PubMed
    1. Zhang J, Gong W, Ye L, Wang F, Shangguan Z, Cheng Y. A Review of deep learning methods for denoising of medical low-dose CT images. Computers in Biology and Medicine. 2024.108112. - PubMed