Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results

Kelly Payette et al. IEEE Trans Med Imaging. 2025 Mar.

Abstract

Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, limiting real-world clinical applicability and acceptance. The multi-center FeTA Challenge 2022 focused on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two centers as well as two additional unseen centers. The multi-center data included different MR scanners, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated and 17 algorithms were evaluated. Here, the challenge results are presented, focusing on the generalizability of the submissions. Both in- and out-of-domain, the white matter and ventricles were segmented with the highest accuracy (Top Dice scores: 0.89, 0.87 respectively), while the most challenging structure remains the grey matter (Top Dice score: 0.75) due to anatomical complexity. The top 5 average Dices scores ranged from 0.81-0.82, the top 5 average percentile Hausdorff distance values ranged from 2.3-2.5mm, and the top 5 volumetric similarity scores ranged from 0.90-0.92. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Sample cases from each institution in the testing dataset. Each case is a normally developing fetal brain from gestational week 22, with a super-resolution quality rating of ‘Excellent’. The histograms of the individual labels vary between each institution (green: Kispi, orange: Vienna, blue: CHUV, red: UCSF). The inset is an enlarged view of the first peak to visualize the different histograms of the three institutions.
Fig. 2.
Fig. 2.
Overview of the gestational ages (in weeks) included within the testing dataset by institution.
Fig. 3.
Fig. 3.
Each submission was evaluated separately on the institutional subsets of the testing data in order to determine if certain algorithms performed better or worse on data from specific institutions. The rankings of the participating teams for each institutional subset are shown, with each connected line corresponding to a single FeTA submission. In-domain institutions: Vienna, Kispi; Out-of-domain institutions: CHUV, UCSF.
Fig. 4.
Fig. 4.
In-Domain and Out-of-Domain evaluation metrics by algorithm. In both in- and out-of-domain, as well as for all three evaluation metrics (Dice Similarity Coefficient, 95th Hausdorff Distance, Volume Similarity), the results plateau for the first 10 teams, after which a drop off is observed. The ranking of the teams has changed between the In-Domain and Out-of-Domain metrics.
Fig. 5.
Fig. 5.
Examples of the automatic labels created by the top 5 teams for each of the four institutions (T2w: T2-weighted fetal brain reconstruction; eCSF: external Cerebrospinal Fluid; GM: Grey Matter; WM: White Matter).
Fig. 6.
Fig. 6.
Rankings of participating teams for each metric from top to bottom (left to right). Left column: Global DSC; Middle Column: HD95; Right Column: VS. The first row are box plots of the evaluation data; the middle row visualizes the ranking stability based on bootstrap sampling, and the bottom row displays the significance maps for the ranking stability, where blue cells indicate no significant differences. All plots were generated with the ChallengeR Toolkit. DSC: Dice Similarity Coefficient: HD95: 95th Hausdorff Distance: VS: Volume Similarity.
Fig. 7.
Fig. 7.
The submissions were evaluated and ranked based on the segmentation results from each of the seven brain tissue labels (1: external Cerebrospinal Fluid, 2: Grey Matter, 3: White Matter, 4: Ventricles, 5: Cerebellum, 6: deep Grey Matter, 7: Brainstem), with each connected line corresponding to a single team’s FeTA submission.
Fig. 8.
Fig. 8.
Examples of high and low topology scores of the grey matter segmentations in the challenge results. Left Column: Coronal view of a 34.6 GA fetus with no known neuropathology; Right Column: a 30.9 GA fetus with severe ventriculomegaly and other abnormalities. In the segmentations with high topology scores a continuous cortical ribbon can be observed. In cases with low scores, a gap in the grey matter can be observed in both examples, as well as holes in the segmentation in the ventriculomegaly example, leading to poor scores (where higher numerical BNE scores correspond with poorer results). A perfect BNE score for the cortical grey matter is (2/0/0). The corresponding Betti numbers are displayed in parentheses next to each example. GA: Gestational Age; BNE: Betti Number Error.

References

    1. Gholipour A et al., “Fetal MRI: A technical update with educational aspirations,” Concepts Magn. Reson. A, vol. 43, no. 6, pp. 237–266, Nov. 2014. - PMC - PubMed
    1. Yan W et al., “MRI manufacturer shift and adaptation: Increasing the generalizability of deep learning segmentation for MR images acquired with different scanners,” Radiol., Artif. Intell, vol. 2, no. 4, Jul. 2020, Art. no. e190195. - PMC - PubMed
    1. Glocker B, Robinson R, Castro DC, Dou Q, and Konukoglu E, “Machine learning with multi-site imaging data: An empirical study on the impact of scanner effects,” 2019, arXiv:1910.04597.
    1. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, and Oermann EK, “Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study,” PLOS Med, vol. 15, no. 11, Nov. 2018, Art. no. e1002683. - PMC - PubMed
    1. Guan H and Liu M, “Domain adaptation for medical image analysis: A survey,” IEEE Trans. Biomed. Eng, vol. 69, no. 3, pp. 1173–1185, Mar. 2022. - PMC - PubMed

Publication types