Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 27;17(4):e0266516.
doi: 10.1371/journal.pone.0266516. eCollection 2022.

Machine learning for passive mental health symptom prediction: Generalization across different longitudinal mobile sensing studies

Affiliations

Machine learning for passive mental health symptom prediction: Generalization across different longitudinal mobile sensing studies

Daniel A Adler et al. PLoS One. .

Abstract

Mobile sensing data processed using machine learning models can passively and remotely assess mental health symptoms from the context of patients' lives. Prior work has trained models using data from single longitudinal studies, collected from demographically homogeneous populations, over short time periods, using a single data collection platform or mobile application. The generalizability of model performance across studies has not been assessed. This study presents a first analysis to understand if models trained using combined longitudinal study data to predict mental health symptoms generalize across current publicly available data. We combined data from the CrossCheck (individuals living with schizophrenia) and StudentLife (university students) studies. In addition to assessing generalizability, we explored if personalizing models to align mobile sensing data, and oversampling less-represented severe symptoms, improved model performance. Leave-one-subject-out cross-validation (LOSO-CV) results were reported. Two symptoms (sleep quality and stress) had similar question-response structures across studies and were used as outcomes to explore cross-dataset prediction. Models trained with combined data were more likely to be predictive (significant improvement over predicting training data mean) than models trained with single-study data. Expected model performance improved if the distance between training and validation feature distributions decreased using combined versus single-study data. Personalization aligned each LOSO-CV participant with training data, but only improved predicting CrossCheck stress. Oversampling significantly improved severe symptom classification sensitivity and positive predictive value, but decreased model specificity. Taken together, these results show that machine learning models trained on combined longitudinal study data may generalize across heterogeneous datasets. We encourage researchers to disseminate collected de-identified mobile sensing and mental health symptom data, and further standardize data types collected across studies to enable better assessment of model generalizability.

PubMed Disclaimer

Conflict of interest statement

DA was co-employed by UnitedHealth Group while conducting this analysis, outside of the submitted work. TC is a co-founder and equity holder of HealthRhythms, Inc., is co-employed by UnitedHealth Group, and has received grants from Click Therapeutics related to digital therapeutics, outside of the submitted work. DA and TC hold pending patent applications related to the cited literature. DCM has accepted honoraria and consulting fees from Apple, Inc., Otsuka Pharmaceuticals, Pear Therapeutics, and the One Mind Foundation, royalties from Oxford Press, and has an ownership interest in Adaptive Health, Inc. FW declares no competing interests. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Modeling overview.
Fig 2
Fig 2. Summary of the 44 features used for prediction.
Each data type on the left-hand side is summarized over a 3-day period for each epoch (e.g. 12AM - 6AM) using the aggregation technique (mean or count) described on the right-hand side. Aggregations were performed to align features with ecological momentary assessment (EMA) mental health symptom outcomes.
Fig 3
Fig 3. Example feature distribution differences across datasets.
Assessing feature distributional differences across the CrossCheck (CC), StudentLife sleep EMA (SL: Sleep), and stress EMA (SL: Stress) validation data for an example 11 features across data types. Each subfigure shows a boxplot of the feature distribution within each specific dataset. The centerline of the boxplot is the median, the box edges the interquartile range (IQR), and the fences on the boxplot are values 1.5 x the IQR. The “Missing Days’’ distribution is a histogram, describing counts across participants. A “*” is listed above each of the StudentLife datasets if the distribution differed significantly (Mann-Whitney U test, two-sided, or Chi-square test of independence, α = 0.05) from CrossCheck. The numbers above the “*” are the rank-biserial correlation (RBC) or Cramer’s V, which shows the magnitude of these differences. EMA: Ecological momentary assessment.
Fig 4
Fig 4
Outcome distribution differences across datasets. Sleep (left column) and stress (right column) ecological momentary assessment (EMA) validation distributions for CrossCheck (CC, top row) and StudentLife (SL, bottom row) data. The height of each bar represents the EMA response, where the specific response is listed on the x-axis under that bar. On the bottom, a “*” indicates whether there were significant (Mann-Whitney U test, two-sided, α = 0.05) differences between CrossCheck and StudentLife EMA distributions, with rank-biserial correlation (RBC) values listing the magnitude of these differences.
Fig 5
Fig 5. Sensitivity analysis reveals the combined versus single-study data is more likely to be predictive.
The left y-axis describes the ΔMAE = MAESingle-MAECombined against the sorted distribution percentiles (x-axis). The thick green solid line represents the ΔMAE percentiles, and the dashed black intersection lines show the percentile value (x-axis) where ΔMAE = 0. The right y-axis describes the actual MAE for the combined (blue solid line), and single-study (dashed orange line) data at each percentile. The baseline MAE, or error for a model predicting the average of the training data, is described by the dotted horizontal red line. Wilcoxon signed-rank test (one-sided) statistics (W), p-values, and rank-biserial correlations (RBCs) are included for models where across hyperparameters, using combined data significantly (α = 0.05) outperformed using single-study data (one-sided test). Shaded areas represent 95% confidence intervals around the mean. EMA: Ecological momentary assessment.
Fig 6
Fig 6. Personalization increases training and held-out data alignment, but is not guaranteed to improve prediction performance.
(A) Effects of personalization by changing the number of neighbors (x-axis) used for model training on the feature distribution alignment between training and leave-one-subject-out cross-validation (LOSO-CV) participants (Proxy-A distance, y-axis). (B) Effects of changing the number of neighbors (x-axis) during model training on the model mean absolute error (MAE, y-axis). On all plots, each point is the mean Proxy-A distance (A) or MAE (B) across hyperparameters, and error bars are 95% confidence intervals around the mean. Each plot is split by the training data used (combined versus single-study), and plots are specific to the LOSO-CV result for a study (CrossCheck/StudentLife) and EMA (Sleep/Stress).
Fig 7
Fig 7. SMOTE increases sensitivity, positive predictive value, but reduces specificity and increases mean absolute error.
SMOTE (see legends) oversampled under-represented ecological momentary assessment (EMA) values. The height of each bar is the mean value of the metric described on the x-axis across hyperparameters. Error bars are 95% confidence intervals around the mean. Plots are specific to the leave-one-subject-out cross-validation (LOSO-CV) result for a study (CrossCheck/StudentLife) and ecological momentary assessment (EMA) (Sleep/Stress). The specificity, sensitivity, and positive predictive value (PPV) were calculated by transforming regression results into a classification problem by labeling the two most severe symptom classes in each EMA with a “1” and other symptoms as “0”. Otherwise, the plots analyzed the regression mean absolute error (MAE). “*” indicates p<0.05, and “✝” indicates p<0.10, for a Wilcoxon signed-rank test (one-sided) exploring differences using SMOTE/not using SMOTE across hyperparameter combinations.

References

    1. Insel TR. Digital phenotyping: a global tool for psychiatry. World Psychiatry. 2018;17: 276–277. doi: 10.1002/wps.20550 - DOI - PMC - PubMed
    1. Wang R, Wang W, Aung MSH, Ben-Zeev D, Brian R, Campbell AT, et al. Predicting Symptom Trajectories of Schizophrenia Using Mobile Sensing. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2017;1: 110:1–110:24. doi: 10.1145/3130976 - DOI
    1. Wang R, Scherer EA, Tseng VWS, Ben-Zeev D, Aung MSH, Abdullah S, et al. CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing—UbiComp ‘16. Heidelberg, Germany: ACM Press; 2016. pp. 886–897. doi: 10.1145/2971648.2971740 - DOI
    1. Wang R, Chen F, Chen Z, Li T, Harari G, Tignor S, et al. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing. Seattle, Washington: Association for Computing Machinery; 2014. pp. 3–14. doi: 10.1145/2632048.2632054 - DOI
    1. Adler DA, Ben-Zeev D, Tseng VW-S, Kane JM, Brian R, Campbell AT, et al. Predicting Early Warning Signs of Psychotic Relapse From Passive Sensing Data: An Approach Using Encoder-Decoder Neural Networks. JMIR MHealth UHealth. 2020;8: e19962. doi: 10.2196/19962 - DOI - PMC - PubMed

Publication types