Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2025 Jan 15;15(1):2074.
doi: 10.1038/s41598-025-86536-4.

Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation

Affiliations
Multicenter Study

Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation

Zahra Bashir et al. Sci Rep. .

Abstract

We aimed to develop and evaluate Explainable Artificial Intelligence (XAI) for fetal ultrasound using actionable concepts as feedback to end-users, using a prospective cross-center, multi-level approach. We developed, implemented, and tested a deep-learning model for fetal growth scans using both retrospective and prospective data. We used a modified Progressive Concept Bottleneck Model with pre-established clinical concepts as explanations (feedback on image optimization and presence of anatomical landmarks) as well as segmentations (outlining anatomical landmarks). The model was evaluated prospectively by assessing the following: the model's ability to assess standard plane quality, the correctness of explanations, the clinical usefulness of explanations, and the model's ability to discriminate between different levels of expertise among clinicians. We used 9352 annotated images for model development and 100 videos for prospective evaluation. Overall classification accuracy was 96.3%. The model's performance in assessing standard plane quality was on par with that of clinicians. Agreement between model segmentations and explanations provided by expert clinicians was found in 83.3% and 74.2% of cases, respectively. A panel of clinicians evaluated segmentations as useful in 72.4% of cases and explanations as useful in 75.0% of cases. Finally, the model reliably discriminated between the performances of clinicians with different levels of experience (p- values < 0.01 for all measures) Our study has successfully developed an Explainable AI model for real-time feedback to clinicians performing fetal growth scans. This work contributes to the existing literature by addressing the gap in the clinical validation of Explainable AI models within fetal medicine, emphasizing the importance of multi-level, cross-institutional, and prospective evaluation with clinician end-users. The prospective clinical validation uncovered challenges and opportunities that could not have been anticipated if we had only focused on retrospective development and validation, such as leveraging AI to gauge operator competence in fetal ultrasound.

Keywords: Artificial intelligence, Fetal growth scans, Explainable AI, Human-AI collaboration.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Disclosure: We used AI-assisted technology (GPT, Microsoft Word, Grammarly) to enhance the linguistic clarity of the paper. Our research group bears full responsibility for the content and ensures its accuracy in this paper.

Figures

Fig. 1
Fig. 1
A Flowchart Overview: Visualizing the model development and prospective validation. HC = Head circumference, AC = Abdominal circumference, FL = Femur length and GA = Gestational age. “Other” class = images not belonging to the anatomy of interest.
Fig. 2
Fig. 2
Example of model output. The left image is the raw ultrasound image with segmentations in the middle and concept explanations to the right.

References

    1. Andreasen, L. A. et al. Why we succeed and fail in detecting fetal growth restriction: A population-based study. Acta Obstet. Gynecol. Scand.100, 893–899 (2021). - PubMed
    1. Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med.25, 44–56 (2019). - PubMed
    1. Bano, S. et al. (2021) AutoFB: Automating Fetal Biometry Estimation from Standard Ultrasound Planes. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics)12907 LNCS 228–238.
    1. Chen, H. et al. Ultrasound standard plane detection using a composite neural network framework. IEEE Trans. Cybern.47, 1576–1586 (2017). - PubMed
    1. Płotka, S. et al. Deep learning fetal ultrasound video model match human observers in biometric measurements. Phys. Med. Biol.67(4), 045013 (2022). - PubMed

Publication types