Agreement and Reliability between Clinically Available Software Programs in Measuring Volumes and Normative Percentiles of Segmented Brain Regions
- PMID: 36175000
- PMCID: PMC9523231
- DOI: 10.3348/kjr.2022.0067
Agreement and Reliability between Clinically Available Software Programs in Measuring Volumes and Normative Percentiles of Segmented Brain Regions
Erratum in
-
Erratum: Agreement and Reliability between Clinically Available Software Programs in Measuring Volumes and Normative Percentiles of Segmented Brain Regions.Korean J Radiol. 2023 Sep;24(9):926-927. doi: 10.3348/kjr.2023.0748. Korean J Radiol. 2023. PMID: 37634647 Free PMC article.
Abstract
Objective: To investigate the agreement and reliability of estimating the volumes and normative percentiles (N%) of segmented brain regions among NeuroQuant (NQ), DeepBrain (DB), and FreeSurfer (FS) software programs, focusing on the comparison between NQ and DB.
Materials and methods: Three-dimensional T1-weighted images of 145 participants (48 healthy participants, 50 patients with mild cognitive impairment, and 47 patients with Alzheimer's disease) from a single medical center (SMC) dataset and 130 participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset were included in this retrospective study. All images were analyzed with DB, NQ, and FS software to obtain volume estimates and N% of various segmented brain regions. We used Bland-Altman analysis, repeated measures ANOVA, reproducibility coefficient, effect size, and intraclass correlation coefficient (ICC) to evaluate inter-method agreement and reliability.
Results: Among the three software programs, the Bland-Altman plot showed a substantial bias, the ICC showed a broad range of reliability (0.004-0.97), and repeated-measures ANOVA revealed significant mean volume differences in all brain regions. Similarly, the volume differences of the three software programs had large effect sizes in most regions (0.73-5.51). The effect size was largest in the pallidum in both datasets and smallest in the thalamus and cerebral white matter in the SMC and ADNI datasets, respectively. N% of NQ and DB showed an unacceptably broad Bland-Altman limit of agreement in all brain regions and a very wide range of ICC values (-0.142-0.844) in most brain regions.
Conclusion: NQ and DB showed significant differences in the measured volume and N%, with limited agreement and reliability for most brain regions. Therefore, users should be aware of the lack of interchangeability between these software programs when they are applied in clinical practice.
Keywords: DeepBrain; FreeSurfer; Intermethod validation; MR volumetry; NeuroQuant; Normative percentile.
Copyright © 2022 The Korean Society of Radiology.
Conflict of interest statement
Seung Hong Choi who is on the editorial board of the <i>Korean Journal of Radiology</i> was not involved in the editorial evaluation or decision to publish this article. All remaining authors have declared no conflicts of interest.
Figures






References
-
- Braak H, Braak E. Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol. 1991;82:239–259. - PubMed
-
- Chan D, Fox NC, Scahill RI, Crum WR, Whitwell JL, Leschziner G, et al. Patterns of temporal lobe atrophy in semantic dementia and Alzheimer’s disease. Ann Neurol. 2001;49:433–442. - PubMed
-
- Killiany RJ, Hyman BT, Gomez-Isla T, Moss MB, Kikinis R, Jolesz F, et al. MRI measures of entorhinal cortex vs hippocampus in preclinical AD. Neurology. 2002;58:1188–1196. - PubMed