Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 30;2(5):e190183.
doi: 10.1148/ryai.2020190183. eCollection 2020 Sep.

Fully Automated Segmentation of Head CT Neuroanatomy Using Deep Learning

Affiliations

Fully Automated Segmentation of Head CT Neuroanatomy Using Deep Learning

Jason C Cai et al. Radiol Artif Intell. .

Abstract

Purpose: To develop a deep learning model that segments intracranial structures on head CT scans.

Materials and methods: In this retrospective study, a primary dataset containing 62 normal noncontrast head CT scans from 62 patients (mean age, 73 years; age range, 27-95 years) acquired between August and December 2018 was used for model development. Eleven intracranial structures were manually annotated on the axial oblique series. The dataset was split into 40 scans for training, 10 for validation, and 12 for testing. After initial training, eight model configurations were evaluated on the validation dataset and the highest performing model was evaluated on the test dataset. Interobserver variability was reported using multirater consensus labels obtained from the test dataset. To ensure that the model learned generalizable features, it was further evaluated on two secondary datasets containing 12 volumes with idiopathic normal pressure hydrocephalus (iNPH) and 30 normal volumes from a publicly available source. Statistical significance was determined using categorical linear regression with P < .05.

Results: Overall Dice coefficient on the primary test dataset was 0.84 ± 0.05 (standard deviation). Performance ranged from 0.96 ± 0.01 (brainstem and cerebrum) to 0.74 ± 0.06 (internal capsule). Dice coefficients were comparable to expert annotations and exceeded those of existing segmentation methods. The model remained robust on external CT scans and scans demonstrating ventricular enlargement. The use of within-network normalization and class weighting facilitated learning of underrepresented classes.

Conclusion: Automated segmentation of CT neuroanatomy is feasible with a high degree of accuracy. The model generalized to external CT scans as well as scans demonstrating iNPH.Supplemental material is available for this article.© RSNA, 2020.

PubMed Disclaimer

Conflict of interest statement

Disclosures of Conflicts of Interest: J.C.C. disclosed no relevant relationships. Z.A. disclosed no relevant relationships. K.A.P. disclosed no relevant relationships. A.B. disclosed no relevant relationships. S.H. disclosed no relevant relationships. A.D.W. disclosed no relevant relationships. P.R. disclosed no relevant relationships. G.M.C. disclosed no relevant relationships. A.Z. disclosed no relevant relationships. D.C.V. disclosed no relevant relationships. Q.H. disclosed no relevant relationships. B.J.E. disclosed no relevant relationships.

Figures

Model evaluation and test dataset workflow. AB = Arunnit Boonrod, MD; DICE = Dice coefficients; iNPH = idiopathic normal pressure hydrocephalus; JC = Jason Cai, MD; RSNA = Radiological Society of North America; SH = Safa Hoodeshenas, MD; VOL = differences in structure volume.
Figure 1:
Model evaluation and test dataset workflow. AB = Arunnit Boonrod, MD; DICE = Dice coefficients; iNPH = idiopathic normal pressure hydrocephalus; JC = Jason Cai, MD; RSNA = Radiological Society of North America; SH = Safa Hoodeshenas, MD; VOL = differences in structure volume.
Training and validation performance of various model configurations. (a) Batch normalization (BN) demonstrated fluctuations in validation performance. (b) This was alleviated in part by increasing the batch size (BS) from three to 12. (c) Per-class Dice coefficients averaged over all examinations in the validation dataset. Error bars represent 1 standard deviation. Model 5 predicted only the brainstem and cerebrum, cerebellum, ventricular system, and subarachnoid space. RN = batch renormalization, LN = layer normalization.
Figure 2a:
Training and validation performance of various model configurations. (a) Batch normalization (BN) demonstrated fluctuations in validation performance. (b) This was alleviated in part by increasing the batch size (BS) from three to 12. (c) Per-class Dice coefficients averaged over all examinations in the validation dataset. Error bars represent 1 standard deviation. Model 5 predicted only the brainstem and cerebrum, cerebellum, ventricular system, and subarachnoid space. RN = batch renormalization, LN = layer normalization.
Training and validation performance of various model configurations. (a) Batch normalization (BN) demonstrated fluctuations in validation performance. (b) This was alleviated in part by increasing the batch size (BS) from three to 12. (c) Per-class Dice coefficients averaged over all examinations in the validation dataset. Error bars represent 1 standard deviation. Model 5 predicted only the brainstem and cerebrum, cerebellum, ventricular system, and subarachnoid space. RN = batch renormalization, LN = layer normalization.
Figure 2b:
Training and validation performance of various model configurations. (a) Batch normalization (BN) demonstrated fluctuations in validation performance. (b) This was alleviated in part by increasing the batch size (BS) from three to 12. (c) Per-class Dice coefficients averaged over all examinations in the validation dataset. Error bars represent 1 standard deviation. Model 5 predicted only the brainstem and cerebrum, cerebellum, ventricular system, and subarachnoid space. RN = batch renormalization, LN = layer normalization.
Training and validation performance of various model configurations. (a) Batch normalization (BN) demonstrated fluctuations in validation performance. (b) This was alleviated in part by increasing the batch size (BS) from three to 12. (c) Per-class Dice coefficients averaged over all examinations in the validation dataset. Error bars represent 1 standard deviation. Model 5 predicted only the brainstem and cerebrum, cerebellum, ventricular system, and subarachnoid space. RN = batch renormalization, LN = layer normalization.
Figure 2c:
Training and validation performance of various model configurations. (a) Batch normalization (BN) demonstrated fluctuations in validation performance. (b) This was alleviated in part by increasing the batch size (BS) from three to 12. (c) Per-class Dice coefficients averaged over all examinations in the validation dataset. Error bars represent 1 standard deviation. Model 5 predicted only the brainstem and cerebrum, cerebellum, ventricular system, and subarachnoid space. RN = batch renormalization, LN = layer normalization.
Confusion matrices for (a) model 3 (cross-entropy loss with attenuated weighting) and (b) model 5 (Dice loss) on the validation dataset. Numbers represent the percentage of voxels in each category.
Figure 3a:
Confusion matrices for (a) model 3 (cross-entropy loss with attenuated weighting) and (b) model 5 (Dice loss) on the validation dataset. Numbers represent the percentage of voxels in each category.
Confusion matrices for (a) model 3 (cross-entropy loss with attenuated weighting) and (b) model 5 (Dice loss) on the validation dataset. Numbers represent the percentage of voxels in each category.
Figure 3b:
Confusion matrices for (a) model 3 (cross-entropy loss with attenuated weighting) and (b) model 5 (Dice loss) on the validation dataset. Numbers represent the percentage of voxels in each category.
A, Without class weights, the model quickly learned large classes (brainstem and cerebrum) at the expense of small classes (central sulcus and septum pellucidum). However, it eventually converged when given sufficient training time. The y-axis represents soft Dice coefficients, which are the Dice coefficients of predicted softmax class probabilities before thresholding. B, Manual segmentation, C, predictions from model trained with attenuated weighting (model 3), D, predictions from model trained with balanced weighting (model 2), E, enlarged section from B (yellow dotted lines), and, F, enlarged section from D (yellow dotted lines). The red arrowheads in F indicate a thin layer of labeled voxels over the brain-cranium boundary. This thin layer was not seen in the unweighted model. See the section on “Effect of Class Weighting” for details.
Figure 4:
A, Without class weights, the model quickly learned large classes (brainstem and cerebrum) at the expense of small classes (central sulcus and septum pellucidum). However, it eventually converged when given sufficient training time. The y-axis represents soft Dice coefficients, which are the Dice coefficients of predicted softmax class probabilities before thresholding. B, Manual segmentation, C, predictions from model trained with attenuated weighting (model 3), D, predictions from model trained with balanced weighting (model 2), E, enlarged section from B (yellow dotted lines), and, F, enlarged section from D (yellow dotted lines). The red arrowheads in F indicate a thin layer of labeled voxels over the brain-cranium boundary. This thin layer was not seen in the unweighted model. See the section on “Effect of Class Weighting” for details.
Sample image from the primary test dataset with corresponding model prediction, expert annotations, and ground truth mask. Images shown are, A, original image, B, observer 1 segmentation, C, observer 2 segmentation, D, observer 3 segmentation, E, ground truth mask generated by majority voting (unlabeled voxels indicate areas where all three observers disagree and are excluded from performance metrics), F, model prediction at basal ganglia level, G, model prediction at cerebellar level, and, H, model prediction at central sulcus level. I, Box and whisker plot of Dice coefficients using ground truth masks as reference (n = 59 slices). Asterisks indicate statistically significant results as compared with the model (P < .05). Dice coefficients are also presented in Table 3. The comparison methodology is outlined in Figure 1.
Figure 5:
Sample image from the primary test dataset with corresponding model prediction, expert annotations, and ground truth mask. Images shown are, A, original image, B, observer 1 segmentation, C, observer 2 segmentation, D, observer 3 segmentation, E, ground truth mask generated by majority voting (unlabeled voxels indicate areas where all three observers disagree and are excluded from performance metrics), F, model prediction at basal ganglia level, G, model prediction at cerebellar level, and, H, model prediction at central sulcus level. I, Box and whisker plot of Dice coefficients using ground truth masks as reference (n = 59 slices). Asterisks indicate statistically significant results as compared with the model (P < .05). Dice coefficients are also presented in Table 3. The comparison methodology is outlined in Figure 1.
Top: Sample images from the idiopathic normal pressure hydrocephalus (iNPH) dataset. The model was not trained on iNPH scans. Coloring legend can be found in Figure 5. Bottom: Box and whisker plot comparing model performance between primary and secondary test datasets. Asterisks indicate statistically significant results (P < .05). Dice coefficients are also presented in Table 4. The comparison methodology is outlined in Figure 1.
Figure 6:
Top: Sample images from the idiopathic normal pressure hydrocephalus (iNPH) dataset. The model was not trained on iNPH scans. Coloring legend can be found in Figure 5. Bottom: Box and whisker plot comparing model performance between primary and secondary test datasets. Asterisks indicate statistically significant results (P < .05). Dice coefficients are also presented in Table 4. The comparison methodology is outlined in Figure 1.

References

    1. Rosman DA, Duszak R Jr, Wang W, Hughes DR, Rosenkrantz AB. Changing Utilization of Noninvasive Diagnostic Imaging Over 2 Decades: An Examination Family-Focused Analysis of Medicare Claims Using the Neiman Imaging Types of Service Categorization System. AJR Am J Roentgenol 2018;210(2):364–368. - PubMed
    1. Erickson BJ, Korfiatis P, Kline TL, Akkus Z, Philbrick K, Weston AD. Deep Learning in Radiology: Does One Size Fit All? J Am Coll Radiol 2018;15(3 Pt B):521–526. - PMC - PubMed
    1. Akkus Z, Cai J, Boonrod A, et al. A Survey of Deep-Learning Applications in Ultrasound: Artificial Intelligence-Powered Ultrasound for Improving Clinical Workflow. J Am Coll Radiol 2019;16(9 Pt B):1318–1328. - PubMed
    1. Akkus Z, Galimzianova A, Hoogi A, Rubin DL, Erickson BJ. Deep Learning for Brain MRI Segmentation: State of the Art and Future Directions. J Digit Imaging 2017;30(4):449–459. - PMC - PubMed
    1. Islam M, Sanghani P, See AAQ, James ML, King NKK, Ren H. ICHNet: Intracerebral Hemorrhage (ICH) Segmentation Using Deep Learning. In: Crimi A, Bakas S, Kuijf H, Keyvan F, Reyes M, van Walsum T, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Cham, Switzerland: Springer, 2019; 456–463.

LinkOut - more resources