Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 4:2:21.
doi: 10.1038/s41746-019-0096-y. eCollection 2019.

Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization

Affiliations

Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization

Pegah Khosravi et al. NPJ Digit Med. .

Abstract

Visual morphology assessment is routinely used for evaluating of embryo quality and selecting human blastocysts for transfer after in vitro fertilization (IVF). However, the assessment produces different results between embryologists and as a result, the success rate of IVF remains low. To overcome uncertainties in embryo quality, multiple embryos are often implanted resulting in undesired multiple pregnancies and complications. Unlike in other imaging fields, human embryology and IVF have not yet leveraged artificial intelligence (AI) for unbiased, automated embryo assessment. We postulated that an AI approach trained on thousands of embryos can reliably predict embryo quality without human intervention. We implemented an AI approach based on deep neural networks (DNNs) to select highest quality embryos using a large collection of human embryo time-lapse images (about 50,000 images) from a high-volume fertility center in the United States. We developed a framework (STORK) based on Google's Inception model. STORK predicts blastocyst quality with an AUC of >0.98 and generalizes well to images from other clinics outside the US and outperforms individual embryologists. Using clinical data for 2182 embryos, we created a decision tree to integrate embryo quality and patient age to identify scenarios associated with pregnancy likelihood. Our analysis shows that the chance of pregnancy based on individual embryos varies from 13.8% (age ≥41 and poor-quality) to 66.3% (age <37 and good-quality) depending on automated blastocyst quality assessment and patient age. In conclusion, our AI-driven approach provides a reproducible way to assess embryo quality and uncovers new, potentially personalized strategies to select embryos.

Keywords: Image processing; Machine learning.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The STORK flowchart: This flowchart illustrates the design and assessment of STORK. First, Human embryo images are provided from the embryology lab and labeled by embryologists as good-quality or poor-quality based on their pregnancy likelihood. Then, the labels and clinical information from the extracted images are integrated, and the Inception-V1 algorithm is trained for good-quality and poor-quality classes. Furthermore, STORK is evaluated by a blind test set to assess its performance in predicting embryo quality. Finally, the CHAID decision tree is used to investigate the interaction between patient age and embryo quality
Fig. 2
Fig. 2
Embryologists’ evaluation: a This figure shows three examples of Veeck and Zaninovich grades and their corresponding quality labels across seven focal depths. b Embryologists evaluate embryo quality using an internal scoring system and subsequently classify them into three major groups (good-quality, fair-quality, poor-quality) based on the pregnancy rate
Fig. 3
Fig. 3
Deep neural network results: a Inception-V1 (fine-tuning the parameters for all layers) results for three datasets. b Inception-V1 via two different training methods (fine-tuning the parameters for all layers and training from scratch) in good-quality and poor-quality embryo quality discrimination dataset. WCM-NY: data from the Center for Reproductive Medicine and Infertility at Weill Cornell Medicine of New York; IRDB-IC: data from the Institute of Reproduction and Developmental Biology of Imperial College; Universidad de Valencia: data from the Institute Valenciano de Infertilidad, Universidad de Valencia
Fig. 4
Fig. 4
STORK vs. embryologists classification: STORK classifies the fair-quality images into existing good-quality and poor-quality classes. For example, panels “a” and “b” are labeled 3A-B (fair-quality) according to the Veeck and Zaninovic grading system, while STORK classified them as poor-quality and good-quality, respectively. Also, panels “c” and “d” are both labeled 3BB (fair-quality). However, the algorithm correctly classified panel “c” as poor-quality and panel “d” as good-quality. As the figure shows, the outcome in the embryos in “b” and “d” is positive live birth, whereas it is negative live birth in “a” and “c
Fig. 5
Fig. 5
Assessment comparison of STORK with five embryologists: This circular heatmap demonstrates the prediction of STORK and five embryologists in the labeling of the same images from 394 embryos. STORK outputs good and poor grades. The heatmap compares STORK’s result with the majority vote results from all of the embryologists for 239 embryos in which the majority (i.e., at least three out of five embryologists) gives good or poor. The embryologists assess the embryos quality using Gardner grading system. Then, they convert the grades to the three different quality scores as good-quality (orange), fair-quality (gray), and poor-quality (navy) based on the pregnancy rate. Also, for a few embryos, the embryologist uses “?” signs (e.g. 3A?), which refer to the low certainty (red) as they are not sure about the exact label. The heatmap illustrates the result of STORK, Majority vote, Embryologist-V, Embryologist-IV, Embryologist-III, Embryologist-II, and Embryologist-I from the outer circle to the inner ones. Orange: embryos with good-quality; navy: embryos with poor-quality; gray: embryos with fair-quality; red: embryos that are not labeled due to uncertainty
Fig. 6
Fig. 6
Interactions between age and embryo quality: The decision tree shows the interactions between IVF patient age and embryo quality using CHAID

Similar articles

Cited by

References

    1. Inhorn MC, Patrizio P. Infertility around the globe: new thinking on gender, reproductive technologies and global movements in the 21st century. Hum. Reprod. Update. 2015;21:411–426. doi: 10.1093/humupd/dmv016. - DOI - PubMed
    1. Chandra A, Copen CE, Stephen EH. Infertility and impaired fecundity in the United States 1982-2010: data from the National Survey of Family Growth. Natl. Health Stat. Report. 2013;67:1–18. - PubMed
    1. Dyer S, et al. International Committee for Monitoring Assisted Reproductive Technologies world report: Assisted reproductive technology 2008, 2009 and 2010. Hum. Reprod. 2016;31:1588–1609. doi: 10.1093/humrep/dew082. - DOI - PubMed
    1. Manna C, Nanni L, Lumini A, Pappalardo S. Artificial intelligence techniques for embryo and oocyte classification. Reprod. Biomed. Online. 2013;26:42–49. doi: 10.1016/j.rbmo.2012.09.015. - DOI - PubMed
    1. Conaghan J, et al. Improving embryo selection using a computer-automated time-lapse image analysis test plus day 3 morphology: results from a prospective multicenter trial. Fertil. Steril. 2013;100:412–419.e5. doi: 10.1016/j.fertnstert.2013.04.021. - DOI - PubMed