Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May:2019:2999-3005.
doi: 10.1145/3308558.3313512.

Predicting pregnancy using large-scale data from a women's health tracking mobile application

Affiliations

Predicting pregnancy using large-scale data from a women's health tracking mobile application

Bo Liu et al. Proc Int World Wide Web Conf. 2019 May.

Abstract

Predicting pregnancy has been a fundamental problem in women's health for more than 50 years. Previous datasets have been collected via carefully curated medical studies, but the recent growth of women's health tracking mobile apps offers potential for reaching a much broader population. However, the feasibility of predicting pregnancy from mobile health tracking data is unclear. Here we develop four models - a logistic regression model, and 3 LSTM models - to predict a woman's probability of becoming pregnant using data from a women's health tracking app, Clue by BioWink GmbH. Evaluating our models on a dataset of 79 million logs from 65,276 women with ground truth pregnancy test data, we show that our predicted pregnancy probabilities meaningfully stratify women: women in the top 10% of predicted probabilities have a 89% chance of becoming pregnant over 6 menstrual cycles, as compared to a 27% chance for women in the bottom 10%. We develop a technique for extracting interpretable time trends from our deep learning models, and show these trends are consistent with previous fertility research. Our findings illustrate the potential that women's health tracking data offers for predicting pregnancy on a broader population; we conclude by discussing the steps needed to fulfill this potential.

Keywords: Pregnancy prediction; mobile health tracking.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Prediction task. The model makes predictions using logs from the first 24 days of a cycle (green interval), and the cycle is labeled using pregnancy tests taken after day 24 of the cycle and before day 24 of the next cycle (red interval). The vast majority of pregnancy tests in our dataset are taken near when the user’s cycle is supposed to start, consistent with proper use of pregnancy tests, so any positive pregnancy tests likely result from activity during the green interval, which will be included in the feature vector.
Figure 2:
Figure 2:
LSTM + BMS fertility model: we feed the daily features x (blue) into an LSTM model to obtain the hidden state h (red). fd is then a function of the hidden state h parameterized by a neural network. The probability of becoming pregnant in a cycle is the function shown at bottom.
Figure 3:
Figure 3:
LSTM + user embedding: we use the history features, xh, in the previous H = 180 days for the user and feed them into the “user history LSTM” (shown at left). The final hidden state becomes the user embedding vector e, which is then concatenated with the daily features from the current cycle and fed into the pregnancy prediction LSTM (right).
Figure 4:
Figure 4:
Model-learned time trends are interpretable for both the simple logistic regression model (left plot) and the best-performing LSTM + user embedding model (right plot). The horizontal axis is the cycle day. The vertical axis in the left plot is the logistic regression weight for logging a feature on that cycle day. The vertical axis in the right plot is how much logging a feature on a cycle day affects the LSTM-inferred probability of pregnancy. In both plots, positive y-values indicate associations with positive pregnancy tests, and negative y-values indicate associations with negative pregnancy tests. Both models learn that protected sex (green line) is negatively associated with pregnancy, while unprotected sex (blue line) is positively associated, particularly during the fertile window, and withdrawal sex (orange line) is intermediate.

Similar articles

Cited by

References

    1. 2017. Glow and National Institutes of Health Collaborate to Advance Fertility Model. PR Newswire (2017).
    1. Alvergne Alexandra, Marija Vlajic Wheeler, and Vedrana Högqvist Tabor. 2018. Do sexually transmitted infections exacerbate negative premenstrual symptoms? Insights from digital health. Evolution, Medicine, and Public Health (2018). - PMC - PubMed
    1. Barrett John C and Marshall John. 1969. The risk of conception on different days of the menstrual cycle. Population Studies 23, 3 (1969), 455–461. - PubMed
    1. Inci M Baytas, Cao Xiao, Zhang Xi, Wang Fei, K Jain Anil, and Zhou Jiayu. 2017. Patient subtyping via time-aware LSTM networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, 65–74.
    1. Bigelow Jamie L, Dunson David B, Stanford Joseph B, Ecochard René, Gnoth Christian, and Colombo Bernardo. 2004. Mucus observations in the fertile window: a better predictor of conception than timing of intercourse. Human Reproduction 19, 4 (2004), 889–892. - PubMed

LinkOut - more resources