Predicting pregnancy using large-scale data from a women's health tracking mobile application

Bo Liu¹, Shuyang Shi¹, Yongshang Wu¹, Daniel Thomas², Laura Symul³, Emma Pierson¹, Jure Leskovec⁴

Affiliations

¹ Dept. of Computer Science, Stanford.
² Clue by BioWink GmbH, Berlin.
³ Dept. of General Surgery and Dept. of Statistics, Stanford.
⁴ Dept. of Computer Science, Stanford Chan-Zuckerberg Biohub.

PMID: 31538145
PMCID: PMC6752881
DOI: 10.1145/3308558.3313512

Predicting pregnancy using large-scale data from a women's health tracking mobile application

Bo Liu et al. Proc Int World Wide Web Conf. 2019 May.

. 2019 May:2019:2999-3005.

doi: 10.1145/3308558.3313512.

Authors

Bo Liu¹, Shuyang Shi¹, Yongshang Wu¹, Daniel Thomas², Laura Symul³, Emma Pierson¹, Jure Leskovec⁴

Affiliations

¹ Dept. of Computer Science, Stanford.
² Clue by BioWink GmbH, Berlin.
³ Dept. of General Surgery and Dept. of Statistics, Stanford.
⁴ Dept. of Computer Science, Stanford Chan-Zuckerberg Biohub.

PMID: 31538145
PMCID: PMC6752881
DOI: 10.1145/3308558.3313512

Abstract

Predicting pregnancy has been a fundamental problem in women's health for more than 50 years. Previous datasets have been collected via carefully curated medical studies, but the recent growth of women's health tracking mobile apps offers potential for reaching a much broader population. However, the feasibility of predicting pregnancy from mobile health tracking data is unclear. Here we develop four models - a logistic regression model, and 3 LSTM models - to predict a woman's probability of becoming pregnant using data from a women's health tracking app, Clue by BioWink GmbH. Evaluating our models on a dataset of 79 million logs from 65,276 women with ground truth pregnancy test data, we show that our predicted pregnancy probabilities meaningfully stratify women: women in the top 10% of predicted probabilities have a 89% chance of becoming pregnant over 6 menstrual cycles, as compared to a 27% chance for women in the bottom 10%. We develop a technique for extracting interpretable time trends from our deep learning models, and show these trends are consistent with previous fertility research. Our findings illustrate the potential that women's health tracking data offers for predicting pregnancy on a broader population; we conclude by discussing the steps needed to fulfill this potential.

Keywords: Pregnancy prediction; mobile health tracking.

PubMed Disclaimer

Figures

**Figure 1:**
Prediction task. The model makes predictions using logs from the first 24 days of a cycle (green interval), and the cycle is labeled using pregnancy tests taken after day 24 of the cycle and before day 24 of the next cycle (red interval). The vast majority of pregnancy tests in our dataset are taken near when the user’s cycle is supposed to start, consistent with proper use of pregnancy tests, so any positive pregnancy tests likely result from activity during the green interval, which will be included in the feature vector.

**Figure 2:**
LSTM + BMS fertility model: we feed the daily features x (blue) into an LSTM model to obtain the hidden state h (red). *f_d* is then a function of the hidden state h parameterized by a neural network. The probability of becoming pregnant in a cycle is the function shown at bottom.

**Figure 3:**
LSTM + user embedding: we use the history features, *x_h*, in the previous H = 180 days for the user and feed them into the “user history LSTM” (shown at left). The final hidden state becomes the user embedding vector e, which is then concatenated with the daily features from the current cycle and fed into the pregnancy prediction LSTM (right).

**Figure 4:**
Model-learned time trends are interpretable for both the simple logistic regression model (left plot) and the best-performing LSTM + user embedding model (right plot). The horizontal axis is the cycle day. The vertical axis in the left plot is the logistic regression weight for logging a feature on that cycle day. The vertical axis in the right plot is how much logging a feature on a cycle day affects the LSTM-inferred probability of pregnancy. In both plots, positive y-values indicate associations with positive pregnancy tests, and negative y-values indicate associations with negative pregnancy tests. Both models learn that protected sex (green line) is negatively associated with pregnancy, while unprotected sex (blue line) is positively associated, particularly during the fertile window, and withdrawal sex (orange line) is intermediate.

See this image and copyright information in PMC

Cited by

The association of COVID-19 vaccination and menstrual health: A period-tracking app-based cohort study.
Ramaiyer M, El Sabeh M, Zhu J, Shea A, Segev D, Yenokyan G, Borahay MA. Ramaiyer M, et al. Vaccine X. 2024 May 18;19:100501. doi: 10.1016/j.jvacx.2024.100501. eCollection 2024 Aug. Vaccine X. 2024. PMID: 38832342 Free PMC article.
Comparison of Multivariable Logistic Regression and Other Machine Learning Algorithms for Prognostic Prediction Studies in Pregnancy Care: Systematic Review and Meta-Analysis.
Sufriyana H, Husnayain A, Chen YL, Kuo CY, Singh O, Yeh TY, Wu YW, Su EC. Sufriyana H, et al. JMIR Med Inform. 2020 Nov 17;8(11):e16503. doi: 10.2196/16503. JMIR Med Inform. 2020. PMID: 33200995 Free PMC article. Review.
Feasibility of continuous distal body temperature for passive, early pregnancy detection.
Grant A, Smarr B. Grant A, et al. PLOS Digit Health. 2022 May 16;1(5):e0000034. doi: 10.1371/journal.pdig.0000034. eCollection 2022 May. PLOS Digit Health. 2022. PMID: 36812529 Free PMC article.
Assessment of menstrual health status and evolution through mobile apps for fertility awareness.
Symul L, Wac K, Hillard P, Salathé M. Symul L, et al. NPJ Digit Med. 2019 Jul 16;2:64. doi: 10.1038/s41746-019-0139-4. eCollection 2019. NPJ Digit Med. 2019. PMID: 31341953 Free PMC article.
Advancing Obstetric Care Through Artificial Intelligence-Enhanced Clinical Decision Support Systems: A Systematic Review.
Abdalrahman Mohammad Ali MO, Abdelgadir Elhabeeb SM, Abdalla Elsheikh NE, Abdalla Mohammed FS, Mahmoud Ali SH, Ibrahim Abdelhalim AA, Altom DS. Abdalrahman Mohammad Ali MO, et al. Cureus. 2025 Mar 13;17(3):e80514. doi: 10.7759/cureus.80514. eCollection 2025 Mar. Cureus. 2025. PMID: 40225537 Free PMC article. Review.

See all "Cited by" articles

References

1. 2017. Glow and National Institutes of Health Collaborate to Advance Fertility Model. PR Newswire (2017).
1. Alvergne Alexandra, Marija Vlajic Wheeler, and Vedrana Högqvist Tabor. 2018. Do sexually transmitted infections exacerbate negative premenstrual symptoms? Insights from digital health. Evolution, Medicine, and Public Health (2018). - PMC - PubMed
1. Barrett John C and Marshall John. 1969. The risk of conception on different days of the menstrual cycle. Population Studies 23, 3 (1969), 455–461. - PubMed
1. Inci M Baytas, Cao Xiao, Zhang Xi, Wang Fei, K Jain Anil, and Zhou Jiayu. 2017. Patient subtyping via time-aware LSTM networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, 65–74.
1. Bigelow Jamie L, Dunson David B, Stanford Joseph B, Ecochard René, Gnoth Christian, and Colombo Bernardo. 2004. Mucus observations in the fertile window: a better predictor of conception than timing of intercourse. Human Reproduction 19, 4 (2004), 889–892. - PubMed

Grants and funding

U54 EB020405/EB/NIBIB NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting pregnancy using large-scale data from a women's health tracking mobile application

Affiliations

Predicting pregnancy using large-scale data from a women's health tracking mobile application

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources