Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 15;2023(3):hoad031.
doi: 10.1093/hropen/hoad031. eCollection 2023.

Embryo selection through artificial intelligence versus embryologists: a systematic review

Affiliations

Embryo selection through artificial intelligence versus embryologists: a systematic review

M Salih et al. Hum Reprod Open. .

Abstract

Study question: What is the present performance of artificial intelligence (AI) decision support during embryo selection compared to the standard embryo selection by embryologists?

Summary answer: AI consistently outperformed the clinical teams in all the studies focused on embryo morphology and clinical outcome prediction during embryo selection assessment.

What is known already: The ART success rate is ∼30%, with a worrying trend of increasing female age correlating with considerably worse results. As such, there have been ongoing efforts to address this low success rate through the development of new technologies. With the advent of AI, there is potential for machine learning to be applied in such a manner that areas limited by human subjectivity, such as embryo selection, can be enhanced through increased objectivity. Given the potential of AI to improve IVF success rates, it remains crucial to review the performance between AI and embryologists during embryo selection.

Study design size duration: The search was done across PubMed, EMBASE, Ovid Medline, and IEEE Xplore from 1 June 2005 up to and including 7 January 2022. Included articles were also restricted to those written in English. Search terms utilized across all databases for the study were: ('Artificial intelligence' OR 'Machine Learning' OR 'Deep learning' OR 'Neural network') AND ('IVF' OR 'in vitro fertili*' OR 'assisted reproductive techn*' OR 'embryo'), where the character '*' refers the search engine to include any auto completion of the search term.

Participants/materials setting methods: A literature search was conducted for literature relating to AI applications to IVF. Primary outcomes of interest were accuracy, sensitivity, and specificity of the embryo morphology grade assessments and the likelihood of clinical outcomes, such as clinical pregnancy after IVF treatments. Risk of bias was assessed using the Modified Down and Black Checklist.

Main results and the role of chance: Twenty articles were included in this review. There was no specific embryo assessment day across the studies-Day 1 until Day 5/6 of embryo development was investigated. The types of input for training AI algorithms were images and time-lapse (10/20), clinical information (6/20), and both images and clinical information (4/20). Each AI model demonstrated promise when compared to an embryologist's visual assessment. On average, the models predicted the likelihood of successful clinical pregnancy with greater accuracy than clinical embryologists, signifying greater reliability when compared to human prediction. The AI models performed at a median accuracy of 75.5% (range 59-94%) on predicting embryo morphology grade. The correct prediction (Ground Truth) was defined through the use of embryo images according to post embryologists' assessment following local respective guidelines. Using blind test datasets, the embryologists' accuracy prediction was 65.4% (range 47-75%) with the same ground truth provided by the original local respective assessment. Similarly, AI models had a median accuracy of 77.8% (range 68-90%) in predicting clinical pregnancy through the use of patient clinical treatment information compared to 64% (range 58-76%) when performed by embryologists. When both images/time-lapse and clinical information inputs were combined, the median accuracy by the AI models was higher at 81.5% (range 67-98%), while clinical embryologists had a median accuracy of 51% (range 43-59%).

Limitations reasons for caution: The findings of this review are based on studies that have not been prospectively evaluated in a clinical setting. Additionally, a fair comparison of all the studies were deemed unfeasible owing to the heterogeneity of the studies, development of the AI models, database employed and the study design and quality.

Wider implications of the findings: AI provides considerable promise to the IVF field and embryo selection. However, there needs to be a shift in developers' perception of the clinical outcome from successful implantation towards ongoing pregnancy or live birth. Additionally, existing models focus on locally generated databases and many lack external validation.

Study funding/competing interests: This study was funded by Monash Data Future Institute. All authors have no conflicts of interest to declare.

Registration number: CRD42021256333.

Keywords: ART; IVF; artificial intelligence; embryo; embryo selection; embryology; machine learning.

PubMed Disclaimer

Conflict of interest statement

This study was funded by Monash Data Future Institute. All authors have no conflicts of interest to declare.All authors declare that there is no conflict of interest.

Figures

Figure 1.
Figure 1.
PRISMA flow diagram of study selection for a systematic review of embryo selection through artificial intelligence versus embryologists. The literature search included studies published since 2005. AI, artificial intelligence.
Figure 2.
Figure 2.
Accuracy of the AI model defined by the data input utilized for model training. Images input: studies including still embryo images and embryo images from timelapse videos. Clinical information input: studies including patient information features, demographics, and treatment information. Images and clinical information input: studies including the use of both embryo images and clinical information. The graph shows the accuracy output of the prediction defined by each study’s input sample type, such as embryo morphology prediction. Embryo grade though images and their quality assessments; clinical pregnancy prediction: prediction of possible successful clinical pregnancy. Embryo aneuploidy prediction: embryo aneuploidy prediction through embryo images. Live birth prediction: prediction of possible successful healthy live birth delivery. AI, artificial intelligence.
Figure 3.
Figure 3.
AI model accuracy compared to prediction accuracy of embryologists according to type of data input. Blue: AI model accuracies among studies. Red: embryologists accuracies among studies. Images input: studies including still embryo images and embryo images from timelapse videos. Clinical information input: studies including patient information features, demographics, and treatment information. Images and clinical information input: studies including the use of both embryo images and clinical information. AI, artificial intelligence.
Figure 4.
Figure 4.
Median accuracy and interquartile range for AI models compared to embryologists’ predictions. Images input: studies including still embryo images and embryo images from timelapse videos. Clinical information input: studies including patient information features, demographics, and treatment information. Images and clinical information input: studies including the use of both embryo images and clinical information. Sample size (n) refers to number of studies on each analysis. AI, artificial intelligence.

Comment in

References

    1. Adom T, Puoane T, De Villiers A, Kengne AP. Prevalence of obesity and overweight in African learners: a protocol for systematic review and meta-analysis. BMJ Open 2017;7:e013538. - PMC - PubMed
    1. Alipour M, Harris DK. A big data analytics strategy for scalable urban infrastructure condition assessment using semi-supervised multi-transform self-training. J Civil Struct Health Monit 2020;10:313–332.
    1. Aubut JA, Marshall S, Bayley M, Teasell RW. A comparison of the PEDro and Downs and Black quality assessment tools using the acquired brain injury intervention literature. NeuroRehabilitation 2013;32:95–102. - PubMed
    1. Bashiri A, Halper KI, Orvieto R. Recurrent implantation failure-update overview on etiology, diagnosis, treatment and future directions. Reprod Biol Endocrinol 2018;16:121. - PMC - PubMed
    1. Baxter Bendus AE, Mayer JF, Shipley SK, Catherino WH. Interobserver and intraobserver variation in day 3 embryo grading. Fertil Steril 2006;86:1608–1615. - PubMed

LinkOut - more resources