Systematic Analysis of Common Factors Impacting Deep Learning Model Generalizability in Liver Segmentation

Affiliations

Affiliation

¹ From the Department of Radiology (B.K., J.M., K.L., I.H.Z., E.B., M.C., G.J., W.F.W., M.R.B.), Department of Radiation Oncology (K.L.), and Department of Medicine, Division of Gastroenterology (M.R.B.), Duke University School of Medicine, Duke University Medical Center, Box 3808, Durham, NC 27710; Department of Electrical & Computer Engineering, Duke University Pratt School of Engineering, Durham, NC (K.L., Y.W.); Department of Radiology, Faculty of Medicine, Benha University, Benha, Egypt (I.H.Z.); Department of Radiology, College of Medicine-Tucson, University of Arizona, Tucson, AZ (E.B.); and Department of Radiology, Rutgers Health-Newark Beth Israel Medical Center, Newark, NJ (M.C.).

PMID: 37293348
PMCID: PMC10245179
DOI: 10.1148/ryai.220080

Systematic Analysis of Common Factors Impacting Deep Learning Model Generalizability in Liver Segmentation

Brandon Konkel et al. Radiol Artif Intell. 2023.

. 2023 Feb 22;5(3):e220080.

doi: 10.1148/ryai.220080. eCollection 2023 May.

Authors

Affiliation

¹ From the Department of Radiology (B.K., J.M., K.L., I.H.Z., E.B., M.C., G.J., W.F.W., M.R.B.), Department of Radiation Oncology (K.L.), and Department of Medicine, Division of Gastroenterology (M.R.B.), Duke University School of Medicine, Duke University Medical Center, Box 3808, Durham, NC 27710; Department of Electrical & Computer Engineering, Duke University Pratt School of Engineering, Durham, NC (K.L., Y.W.); Department of Radiology, Faculty of Medicine, Benha University, Benha, Egypt (I.H.Z.); Department of Radiology, College of Medicine-Tucson, University of Arizona, Tucson, AZ (E.B.); and Department of Radiology, Rutgers Health-Newark Beth Israel Medical Center, Newark, NJ (M.C.).

PMID: 37293348
PMCID: PMC10245179
DOI: 10.1148/ryai.220080

Abstract

Purpose: To investigate the effect of training data type on generalizability of deep learning liver segmentation models.

Materials and methods: This Health Insurance Portability and Accountability Act-compliant retrospective study included 860 MRI and CT abdominal scans obtained between February 2013 and March 2018 and 210 volumes from public datasets. Five single-source models were trained on 100 scans each of T1-weighted fat-suppressed portal venous (dynportal), T1-weighted fat-suppressed precontrast (dynpre), proton density opposed-phase (opposed), single-shot fast spin-echo (ssfse), and T1-weighted non-fat-suppressed (t1nfs) sequence types. A sixth multisource (DeepAll) model was trained on 100 scans consisting of 20 randomly selected scans from each of the five source domains. All models were tested against 18 target domains from unseen vendors, MRI types, and modality (CT). The Dice-Sørensen coefficient (DSC) was used to quantify similarity between manual and model segmentations.

Results: Single-source model performance did not degrade significantly against unseen vendor data. Models trained on T1-weighted dynamic data generally performed well on other T1-weighted dynamic data (DSC = 0.848 ± 0.183 [SD]). The opposed model generalized moderately well to all unseen MRI types (DSC = 0.703 ± 0.229). The ssfse model failed to generalize well to any other MRI type (DSC = 0.089 ± 0.153). Dynamic and opposed models generalized moderately well to CT data (DSC = 0.744 ± 0.206), whereas other single-source models performed poorly (DSC = 0.181 ± 0.192). The DeepAll model generalized well across vendor, modality, and MRI type and against externally sourced data.

Conclusion: Domain shift in liver segmentation appears to be tied to variations in soft-tissue contrast and can be effectively bridged with diversification of soft-tissue representation in training data.Keywords: Convolutional Neural Network (CNN), Deep Learning Algorithms, Machine Learning Algorithms, Supervised Learning, CT, MRI, Liver Segmentation Supplemental material is available for this article. © RSNA, 2023.

Keywords: CT; Convolutional Neural Network (CNN); Deep Learning Algorithms; Liver Segmentation; MRI; Machine Learning Algorithms; Supervised Learning.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: B.K. No relevant relationships. J.M. No relevant relationships. K.L. No relevant relationships. I.H.Z. No relevant relationships. E.B. No relevant relationships. M.C. No relevant relationships. Y.W. No relevant relationships. G.J. No relevant relationships. W.F.W. NIH 1R01-NS123275-01A1 and The Marcus Foundation (research funding not direct support for this study); consulting fees from Qure.ai; payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events from Stanford AIMI Symposium 2021; participation on a Data Safety Monitoring Board or Advisory Board from University of Wisconsin-GE CT Protocols Partnership. M.R.B. Grants/contracts from Siemens Healthineers, Madrigal Pharmaceuticals, Carmot Therapeutics, Corcept, NGM Biopharmaceuticals, and Metacrine.

Figures

Single-source and DeepAll model three-dimensional performance on
unseen MRI type, vendor, and modality. Box and whisker plots represent
median, first and third quartiles, and minimum and maximum of
Dice-Sørensen coefficient (DSC) values for single-source segmentation
models trained on (A) T1-weighted precontrast (dynpre), (B) T1-weighted
dynamic portal venous phase (dynportal), (C) T1-weighted
non–fat-suppressed (t1nfs), (D) proton density opposed-phase
(opposed), and (E) T2-weighted single-shot fast spin-echo (ssfse) MR image
types and a multisource (F) DeepAll model trained on data pooled from
single-source domains. Plots represent testing against source domain test
sets (leftmost box) and target domains grouped by unseen MRI sequence
(green), vendor (orange), and modality (blue). Unseen MRI sequence type data
include holdout data from source domains, as well as T1-weighted dynamic
arterial phase (dynarterial), T1-weighted dynamic delayed phase (dyndelay),
T1-weighted dynamic hepatobiliary phase (dynhbp), T2-weighted single-shot
fast spin-echo with fat suppression (ssfsefs), and T2-weighted fast
spin-echo with fat suppression (t2fse). Data from non-Siemens (ns) vendors
include the "ns-" prefix. Data from unseen modality include arterial phase
CT (ct-arterial), portal phase CT (ct-portal), and delayed phase CT
(ct-delay). — **Figure 1:**
Single-source and DeepAll model three-dimensional performance on unseen MRI type, vendor, and modality. Box and whisker plots represent median, first and third quartiles, and minimum and maximum of Dice-Sørensen coefficient (DSC) values for single-source segmentation models trained on **(A)** T1-weighted precontrast (dynpre), **(B)** T1-weighted dynamic portal venous phase (dynportal), **(C)** T1-weighted non–fat-suppressed (t1nfs), **(D)** proton density opposed-phase (opposed), and **(E)** T2-weighted single-shot fast spin-echo (ssfse) MR image types and a multisource **(F)** DeepAll model trained on data pooled from single-source domains. Plots represent testing against source domain test sets (leftmost box) and target domains grouped by unseen MRI sequence (green), vendor (orange), and modality (blue). Unseen MRI sequence type data include holdout data from source domains, as well as T1-weighted dynamic arterial phase (dynarterial), T1-weighted dynamic delayed phase (dyndelay), T1-weighted dynamic hepatobiliary phase (dynhbp), T2-weighted single-shot fast spin-echo with fat suppression (ssfsefs), and T2-weighted fast spin-echo with fat suppression (t2fse). Data from non-Siemens (ns) vendors include the “ns-” prefix. Data from unseen modality include arterial phase CT (ct-arterial), portal phase CT (ct-portal), and delayed phase CT (ct-delay).

Model three-dimensional performance on unseen MRI types (holdout
sets). Heat map represents mean Dice-Sørensen coefficient (DSC) of
single-source and DeepAll model performance across MRI types held out from
source domains. Source domain data include T1-weighted precontrast (dynpre),
T1-weighted dynamic portal venous phase (dynportal), T1-weighted
non–fat-suppressed (t1nfs), proton density opposed-phase (opposed),
and T2-weighted single-shot fast spin-echo (ssfse). Model results
demonstrating performance on holdout data from their respective domains are
outlined in blue. P values indicate no statistically significant difference
between performance of single-source models tested against intradomain
holdout data and DeepAll model performance on that same set. — **Figure 2:**
Model three-dimensional performance on unseen MRI types (holdout sets). Heat map represents mean Dice-Sørensen coefficient (DSC) of single-source and DeepAll model performance across MRI types held out from source domains. Source domain data include T1-weighted precontrast (dynpre), T1-weighted dynamic portal venous phase (dynportal), T1-weighted non–fat-suppressed (t1nfs), proton density opposed-phase (opposed), and T2-weighted single-shot fast spin-echo (ssfse). Model results demonstrating performance on holdout data from their respective domains are outlined in blue. P values indicate no statistically significant difference between performance of single-source models tested against intradomain holdout data and DeepAll model performance on that same set.

Model three-dimensional performance on unseen MRI types. Heat map
represents mean Dice-Sørensen coefficient (DSC) of single-source and
DeepAll model performance across unseen MRI type domains. Source domain data
include T1-weighted precontrast (dynpre), T1-weighted dynamic portal venous
phase (dynportal), T1-weighted non–fat-suppressed (t1nfs), proton
density opposed-phase (opposed), and T2-weighted single-shot fast spin-echo
(ssfse). Target domain data include T1-weighted dynamic arterial phase
(dynarterial), T1-weighted dynamic delayed phase (dyndelay), T1-weighted
dynamic hepatobiliary phase (dynhbp), T2-weighted single-shot fast spin-echo
with fat suppression (ssfsefs), and T2-weighted fast spin-echo with fat
suppression (t2fse). — **Figure 3:**
Model three-dimensional performance on unseen MRI types. Heat map represents mean Dice-Sørensen coefficient (DSC) of single-source and DeepAll model performance across unseen MRI type domains. Source domain data include T1-weighted precontrast (dynpre), T1-weighted dynamic portal venous phase (dynportal), T1-weighted non–fat-suppressed (t1nfs), proton density opposed-phase (opposed), and T2-weighted single-shot fast spin-echo (ssfse). Target domain data include T1-weighted dynamic arterial phase (dynarterial), T1-weighted dynamic delayed phase (dyndelay), T1-weighted dynamic hepatobiliary phase (dynhbp), T2-weighted single-shot fast spin-echo with fat suppression (ssfsefs), and T2-weighted fast spin-echo with fat suppression (t2fse).

Model three-dimensional performance on unseen vendor (GE or Philips)
domains. Heat map represents mean Dice-Sørensen coefficient (DSC) of
single-source and DeepAll model performance across unseen vendor domains.
Model results demonstrating performance on cross-vendor data of the same MRI
type are outlined in blue. P values indicate no statistically significant
difference between performance of single-source models tested against
cross-vendor data of the same MRI type and DeepAll model performance on that
same set (cross-vendor vs DeepAll). Source domain data include T1-weighted
precontrast (dynpre), T1-weighted dynamic portal venous phase (dynportal),
T1-weighted non–fat-suppressed (t1nfs), proton density opposed-phase
(opposed), and T2-weighted single-shot fast spin-echo (ssfse). Data from
non-Siemens (ns) vendors include the “ns-” prefix. — **Figure 4:**
Model three-dimensional performance on unseen vendor (GE or Philips) domains. Heat map represents mean Dice-Sørensen coefficient (DSC) of single-source and DeepAll model performance across unseen vendor domains. Model results demonstrating performance on cross-vendor data of the same MRI type are outlined in blue. P values indicate no statistically significant difference between performance of single-source models tested against cross-vendor data of the same MRI type and DeepAll model performance on that same set (cross-vendor vs DeepAll). Source domain data include T1-weighted precontrast (dynpre), T1-weighted dynamic portal venous phase (dynportal), T1-weighted non–fat-suppressed (t1nfs), proton density opposed-phase (opposed), and T2-weighted single-shot fast spin-echo (ssfse). Data from non-Siemens (ns) vendors include the “ns-” prefix.

Model three-dimensional performance on unseen vendor (GE or Philips)
and source domains. Heat map represents mean Dice-Sørensen
coefficient (DSC) of DeepAll model performance across unseen vendor domains
(cross-vendor) and holdout sets from source domain data (intra-domain). P
values indicate no statistically significant difference between performance
of the DeepAll model tested against cross-vendor data and intradomain
holdout sets. Target domain MRI types include T1-weighted precontrast
(ns-dynpre), T1-weighted dynamic portal venous phase (ns-dynportal),
T1-weighted non–fat-suppressed (ns-t1nfs), proton density
opposed-phase (ns-opposed), and T2-weighted single-shot fast spin-echo
(ns-ssfse). Data from non-Siemens (ns) vendors include the
“ns-” prefix. — **Figure 5:**
Model three-dimensional performance on unseen vendor (GE or Philips) and source domains. Heat map represents mean Dice-Sørensen coefficient (DSC) of DeepAll model performance across unseen vendor domains (cross-vendor) and holdout sets from source domain data (intra-domain). P values indicate no statistically significant difference between performance of the DeepAll model tested against cross-vendor data and intradomain holdout sets. Target domain MRI types include T1-weighted precontrast (ns-dynpre), T1-weighted dynamic portal venous phase (ns-dynportal), T1-weighted non–fat-suppressed (ns-t1nfs), proton density opposed-phase (ns-opposed), and T2-weighted single-shot fast spin-echo (ns-ssfse). Data from non-Siemens (ns) vendors include the “ns-” prefix.

Model three-dimensional performance on unseen modality (CT) domains.
Heat map represents mean Dice-Sørensen coefficient (DSC) of
single-source and DeepAll model performance across unseen modality domains
(CT). Source domain data include T1-weighted precontrast (dynpre),
T1-weighted dynamic portal venous phase (dynportal), T1-weighted
non–fat-suppressed (t1nfs), proton density opposed-phase (opposed),
and T2-weighted single-shot fast spin-echo (ssfse). Data from unseen
modality include arterial phase CT (ct-arterial), portal phase CT
(ct-portal), and delayed phase CT (ct-delay). — **Figure 6:**
Model three-dimensional performance on unseen modality (CT) domains. Heat map represents mean Dice-Sørensen coefficient (DSC) of single-source and DeepAll model performance across unseen modality domains (CT). Source domain data include T1-weighted precontrast (dynpre), T1-weighted dynamic portal venous phase (dynportal), T1-weighted non–fat-suppressed (t1nfs), proton density opposed-phase (opposed), and T2-weighted single-shot fast spin-echo (ssfse). Data from unseen modality include arterial phase CT (ct-arterial), portal phase CT (ct-portal), and delayed phase CT (ct-delay).

See this image and copyright information in PMC

References

1. Weng W , Zhu X . INet: convolutional networks for biomedical image segmentation . IEEE Access 2021. ; 9 : 16591 – 16603 .
1. Mazurowski MA , Buda M , Saha A , Bashir MR . Deep learning in radiology: An overview of the concepts and a survey of the state of the art with focus on MRI . J Magn Reson Imaging 2019. ; 49 ( 4 ): 939 – 954 . - PMC - PubMed
1. Bilic P , Christ PF , Vorontsov E , et al . The Liver Tumor Segmentation Benchmark (LiTS) . arXiv 1901.04056 [preprint] https://arxiv.org/abs/1901.04056. Posted January 13, 2019. Accessed September 1, 2021 . - PMC - PubMed
1. Isensee F , Jaeger PF , Kohl SAA , Petersen J , Maier-Hein KH . nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation . Nat Methods 2021. ; 18 ( 2 ): 203 – 211 . - PubMed
1. Quiñonero-Candela J , Sugiyama M , Schwaighofer A , Lawrence ND , eds . Dataset Shift in Machine Learning (Neural Information Processing) . http://www.acad.bg/ebook/ml/The.MIT.Press.Dataset.Shift.in.Machine.Learn.... Published 2009. Accessed November 2, 2021 .

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systematic Analysis of Common Factors Impacting Deep Learning Model Generalizability in Liver Segmentation

Affiliation

Systematic Analysis of Common Factors Impacting Deep Learning Model Generalizability in Liver Segmentation

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources