Tractostorm: The what, why, and how of tractography dissection reproducibility

Francois Rheault¹, Alessandro De Benedictis², Alessandro Daducci³, Chiara Maffei⁴, Chantal M W Tax⁵, David Romascano⁶, Eduardo Caverzasi⁷, Felix C Morency⁸, Francesco Corrivetti⁹, Franco Pestilli¹⁰, Gabriel Girard⁶, Guillaume Theaud¹, Ilyess Zemmoura¹¹, Janice Hau¹², Kelly Glavin¹³, Kesshi M Jordan⁷, Kristofer Pomiecko¹³, Maxime Chamberland⁵, Muhamed Barakovic⁶, Nil Goyette⁸, Philippe Poulin¹, Quentin Chenot¹⁴, Sandip S Panesar¹⁵, Silvio Sarubbo¹⁶, Laurent Petit¹⁷, Maxime Descoteaux¹

Affiliations

¹ Sherbrooke Connectivity Imaging Laboratory (SCIL), Université de Sherbrooke, Sherbrooke, Canada.
² Neurosurgery Unit, Department of Neuroscience and Neurorehabilitation, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy.
³ Computer Science Department, University of Verona, Verona, Italy.
⁴ Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Boston, MA.
⁵ Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, UK.
⁶ Signal Processing Lab (LTS5), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
⁷ Department of Neurology, University of California, San Francisco, CA.
⁸ Imeka Solutions, Sherbrooke, Canada.
⁹ Départment de neurochirurgie, Hôpital Lariboisière, Paris, France.
¹⁰ Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN.
¹¹ UMR 1253, iBrain, Université de Tours, Inserm, Tours, France.
¹² Brain Development Imaging Laboratories, Department of Psychology, San Diego State University, San Diego, CA, USA.
¹³ Learning Research & Development Center (LRDC), University of Pittsburgh, Pittsburgh, PA, USA.
¹⁴ ISAE-SUPAERO, Toulouse, France.
¹⁵ Department of Neurosurgery, Stanford University, Standford, CA.
¹⁶ Division of Neurosurgery, Emergency Department, "S. Chiara" Hospital, Azienda Provinciale per i Servizi Sanitari (APSS), Trento, Italy.
¹⁷ Groupe d'Imagerie Neurofonctionnelle, Institut des Maladies Neurodégénératives - UMR 5293, CNRS, CEA University of Bordeaux, Bordeaux, France.

PMID: 31925871
PMCID: PMC7267902
DOI: 10.1002/hbm.24917

Review

Tractostorm: The what, why, and how of tractography dissection reproducibility

Francois Rheault et al. Hum Brain Mapp. 2020 May.

. 2020 May;41(7):1859-1874.

doi: 10.1002/hbm.24917. Epub 2020 Jan 10.

Authors

Affiliations

¹ Sherbrooke Connectivity Imaging Laboratory (SCIL), Université de Sherbrooke, Sherbrooke, Canada.
² Neurosurgery Unit, Department of Neuroscience and Neurorehabilitation, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy.
³ Computer Science Department, University of Verona, Verona, Italy.
⁴ Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Boston, MA.
⁵ Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, UK.
⁶ Signal Processing Lab (LTS5), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
⁷ Department of Neurology, University of California, San Francisco, CA.
⁸ Imeka Solutions, Sherbrooke, Canada.
⁹ Départment de neurochirurgie, Hôpital Lariboisière, Paris, France.
¹⁰ Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN.
¹¹ UMR 1253, iBrain, Université de Tours, Inserm, Tours, France.
¹² Brain Development Imaging Laboratories, Department of Psychology, San Diego State University, San Diego, CA, USA.
¹³ Learning Research & Development Center (LRDC), University of Pittsburgh, Pittsburgh, PA, USA.
¹⁴ ISAE-SUPAERO, Toulouse, France.
¹⁵ Department of Neurosurgery, Stanford University, Standford, CA.
¹⁶ Division of Neurosurgery, Emergency Department, "S. Chiara" Hospital, Azienda Provinciale per i Servizi Sanitari (APSS), Trento, Italy.
¹⁷ Groupe d'Imagerie Neurofonctionnelle, Institut des Maladies Neurodégénératives - UMR 5293, CNRS, CEA University of Bordeaux, Bordeaux, France.

PMID: 31925871
PMCID: PMC7267902
DOI: 10.1002/hbm.24917

Abstract

Investigative studies of white matter (WM) brain structures using diffusion MRI (dMRI) tractography frequently require manual WM bundle segmentation, often called "virtual dissection." Human errors and personal decisions make these manual segmentations hard to reproduce, which have not yet been quantified by the dMRI community. It is our opinion that if the field of dMRI tractography wants to be taken seriously as a widespread clinical tool, it is imperative to harmonize WM bundle segmentations and develop protocols aimed to be used in clinical settings. The EADC-ADNI Harmonized Hippocampal Protocol achieved such standardization through a series of steps that must be reproduced for every WM bundle. This article is an observation of the problematic. A specific bundle segmentation protocol was used in order to provide a real-life example, but the contribution of this article is to discuss the need for reproducibility and standardized protocol, as for any measurement tool. This study required the participation of 11 experts and 13 nonexperts in neuroanatomy and "virtual dissection" across various laboratories and hospitals. Intra-rater agreement (Dice score) was approximately 0.77, while inter-rater was approximately 0.65. The protocol provided to participants was not necessarily optimal, but its design mimics, in essence, what will be required in future protocols. Reporting tractometry results such as average fractional anisotropy, volume or streamline count of a particular bundle without a sufficient reproducibility score could make the analysis and interpretations more difficult. Coordinated efforts by the diffusion MRI tractography community are needed to quantify and account for reproducibility of WM bundle extraction protocols in this era of open and collaborative science.

Keywords: bundle segmentation; diffusion MRI; inter-rater; intra-rater; reproducibility; tractography; white matter.

PubMed Disclaimer

Figures

**Figure 1**
Illustration of the dissection plan of the PyT using the MI‐Brain software (Rheault, Houde, Goyette, Morency, & Descoteaux, 2016). Three axial inclusion ROIs (pink, green, yellow), one sagittal exclusion ROIs (orange), two coronal exclusion ROIs (light yellow), and a cerebellum exclusion ROIs (red, optional). The whole brain tractogram was segmented to obtain the left PyT. PyT, pyramidal tract; ROIs, regions of interest

**Figure 2**
Representation of the Dice Coefficient (overlap) for both the streamline and the voxel representation. For the purpose of a didactic illustration, four streamlines are showed in a 2×5 “voxel grid,” the red and blue streamlines are identical. Each streamline is converted to a binary mask (point‐based for simplicity) shown in a compact representation. Voxels with points from three different streamlines will results in voxels with three different colors, this can be seen as a spatial smoothing. The matrices on the right show values for all pairs (symmetrical). The green and yellow streamline are not identical, which results in a streamline‐wise Dice coefficient of zero. However, in the voxel representation they have three voxels in common and the result is $(\frac{2 \times 3}{5 + 3} = 0.75)$

**Figure 3**
Representation of the study design showing N participants, each received five HCP datasets (listed and color coded) which were replicated three times (original, flipped, translated). All participants had to perform the same dissection tasks, on the same anonymized datasets. Intra‐rater, inter‐rater, and gold standard reproducibility were computed using the deanonymized datasets. More details are available in the Supporting Information

**Figure 4**
Comparison of bundles and the impacts of spurious streamlines on the reproducibility measurements. Each block shows streamlines on the left and the voxel representation on the right (isosurface). Block 2a and 3a shows the core (green/orange) and spurious (red/pink) portion of the bundle. Block 2b and 3b only shows the core portion of the bundle. Table showing the reproducibility “score” between bundles, VOX marks voxel‐wise measures, and STR marks streamlinewise measures

**Figure 5**
Example of average segmentation, or gold standard, generation obtained from seven different segmentations, first row shows the streamline representation and the second row shows the voxel represented as a smooth isosurface. From left to right, multiple voting ratios were used $(\frac{1}{7}, \frac{3}{7}, \frac{5}{7}, \frac{7}{7})$ , each time reducing the number of streamlines and voxels consider part of the average segmentation. A minimal vote set at one out of seven (left) is equivalent to a union of all segmentations while a vote set at seven out of seven (right) is equivalent to an intersection between all segmentations

**Figure 6**
Measurements (Q ₂; *IQR*) related to individual files for both groups. The Average FA distribution for experts (0.49; 0.01) and nonexperts (0.47;0.03) is not statistically different from each other. Similarly, the average length of experts (140.33 mm; 7.81 mm) and nonexperts (138.70 mm; 11.29 mm) cannot be distinguished. Streamlines count of experts (2,893; 3564*) has a significant difference of distribution from nonexperts (9,383; 12,368*). The same can be same from the volume distribution (34.00 cm³; 16.43 cm³*) for experts and (48.74 cm³; 24.57 cm³*) for nonexperts. The lower and higher fences for nonexperts are much wider, indicating more variation in results

**Figure 7**
Measurements (Q ₂; *IQR*) related to pairwise comparison measures for intra‐rater segmentations. The correlation of density maps showed no statistically significant difference between the experts (0.90; 0.17) and the nonexperts (0.90; 0.17) groups. Distributions showed statistically significant difference for both Dice score. The Dice score of streamlines shows a easily observable difference between experts (0.10; 0.39*) and nonexperts (0.37; 0.46*). The difference between distribution Dice score of voxels is less noticeable at (0.75; 0.15*) for experts and (0.79; 0.14*) for nonexperts. The trend for the intra‐rater reproducibility is that rater fails to select the same streamlines, but the ones that are selected still cover approximately the same volume. IQR: interquartile range

**Figure 8**
Measurements (Q ₂; *IQR*) (Q ₂; *IQR*) related to pairwise comparison measures for inter‐rater segmentations. The correlation of density maps showed no statistically significant difference between the experts (0.82; 0.23*) and the nonexperts (0.77; 0.29*) groups. Similarly to the intra‐rater segmentation, distributions showed statistically significant difference for both Dice score. The Dice score of streamlines shows a easily observable difference between experts (0.11; 0.14*) and nonexperts (0.18; 0.32*). While the distribution Dice score of voxels for experts (0.63; 0.20*) and nonexperts (0.67; 0.18*) is more similar. Raters have difficulty to select the same streamlines, but overall capture similar volume. IQR: interquartile range

**Figure 9**
Measurements (Q ₂; *IQR*) related to pairwise comparison measures against the gold standard. The correlation of density map reaching (0.95; 0.04*) for experts and (0.88;1 5*) is statistically different between both groups. However, the Dice score of streamlines are not statistically different at (0.39; 0.18) and (0.34; 0.34), respectively. The Dice score of voxel is relatively high at (0.82; 0.05*) for experts and (0.76; 0.13*) for nonexperts. Despite variations between rater, overall the participants remain around the same average segmentation and obtain more agreement with the gold standard than with each other. IQR: interquartile range

**Figure 10**
Measurements (Q ₂; *IQR*) related to binary classification measures against the gold standard. The Kappa score is only significantly different for voxel (0.84; 0.06 and 0.80; 0.13) and not for streamlines (0.60; 0.16* and 0.65; 0.41*). There is a high degree of variability for precision and sensitivity of streamlines (0.81; 0.19* and 0.50; 0.24* for experts) and (0.59; 0.37* and 0.82; 0.44* for nonexperts). These measures are more reliable with the voxel representation (0.92; 0.10* and 0.79; 0.17* for experts) and (0.78; 0.17* and 0.82; 0.44* for nonexperts). The streamline representation is always less reproducible than the voxel representation. The measures such as accuracy and specificity are not shown due to the fact that both reach above 0.99 and do not provide useful visual insight. IQR: interquartile range

See this image and copyright information in PMC

References

1. Apostolova, L. G. , Zarow, C. , Biado, K. , Hurtz, S. , Boccardi, M. , Somme, J. , … Watson, C. (2015). Relationship between hippocampal atrophy and neuropathology markers: A 7t mri validation study of the eadc‐adni harmonized hippocampal segmentation protocol. Alzheimer's & Dementia, 11, 139–150. - PMC - PubMed
1. Avants, B. B. , Epstein, C. L. , Grossman, M. , & Gee, J. C. (2008). Symmetric diffeomorphic image registration with cross‐correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis, 12, 26–41. - PMC - PubMed
1. Bayrak, R. G. , Schilling, K. G. , Greer, J. M. , Hansen, C. B. , Greer, C. M. , Blaber, J. A. , … Landman, B. (2019). Tractem: Fast protocols for whole brain deterministic tractography‐based white matter atlas. bioRxiv, 651935.
1. Behrens, T. E. , Berg, H. J. , Jbabdi, S. , Rushworth, M. F. , & Woolrich, M. W. (2007). Probabilistic diffusion tractography with multiple fibre orientations: What can we gain? NeuroImage, 34, 144–155. - PMC - PubMed
1. Behrens, T. E. , Johansen‐Berg, H. , Woolrich, M. , Smith, S. , Wheeler‐Kingshott, C. , Boulby, P. , et al. (2003). Non‐invasive mapping of connections between human thalamus and cortex using diffusion imaging. Nature Neuroscience, 6, 750–757. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Tractostorm: The what, why, and how of tractography dissection reproducibility

Affiliations

Tractostorm: The what, why, and how of tractography dissection reproducibility

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources