This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Apr 17:rs.3.rs-6414400.

doi: 10.21203/rs.3.rs-6414400/v1.

Using Natural Language Processing to Track Negative Emotions in the Daily Lives of Adolescents

Hadar Fisher¹, Nigel Jaffe², Kristina Pidvirny², Anna Tierney², Diego Pizzagalli³, Christian Webb⁴

Affiliations

¹ Harvard Medical School and McLean Hospital.
² McLean Hospital.
³ Noel Drury, M.D. Institute for Translational Depression Discoveries, University of California.
⁴ Harvard Medical School & McLean Hospital.

PMID: 40321753
PMCID: PMC12047991
DOI: 10.21203/rs.3.rs-6414400/v1

Using Natural Language Processing to Track Negative Emotions in the Daily Lives of Adolescents

Hadar Fisher et al. Res Sq. 2025.

[Preprint]. 2025 Apr 17:rs.3.rs-6414400.

doi: 10.21203/rs.3.rs-6414400/v1.

Authors

Hadar Fisher¹, Nigel Jaffe², Kristina Pidvirny², Anna Tierney², Diego Pizzagalli³, Christian Webb⁴

Affiliations

¹ Harvard Medical School and McLean Hospital.
² McLean Hospital.
³ Noel Drury, M.D. Institute for Translational Depression Discoveries, University of California.
⁴ Harvard Medical School & McLean Hospital.

PMID: 40321753
PMCID: PMC12047991
DOI: 10.21203/rs.3.rs-6414400/v1

Abstract

Tracking emotion fluctuations in adolescents' daily lives is essential for understanding mood dynamics and identifying early markers of affective disorders. This study examines the potential of text-based approaches for emotion prediction by comparing nomothetic (group-level) and idiographic (individualized) models in predicting adolescents' daily negative affect (NA) from text features. Additionally, we evaluate different Natural Language Processing (NLP) techniques for capturing within-person emotion fluctuations. We analyzed ecological momentary assessment (EMA) text responses from 97 adolescents (ages 14-18, 77.3% female, 22.7% male, N_EMA=7,680). Text features were extracted using a dictionary-based approach, topic modeling, and GPT-derived emotion ratings. Random Forest and Elastic Net Regression models predicted NA from these text features, comparing nomothetic and idiographic approaches. All key findings, interactive visualizations, and model comparisons are available via a companion web app: https://emotracknlp.streamlit.app/. Idiographic models combining text features from different NLP approaches exhibited the best performance: they performed comparably to nomothetic models in R² but yielded lower prediction error (Root Mean Squared Error), improving within-person precision. Importantly, there were substantial between-person differences in model performance and predictive linguistic features. When selecting the best-performing model for each participant, significant correlations between predicted and observed emotion scores were found for 90.7-94.8% of participants. Our findings suggest that while nomothetic models offer initial scalability, idiographic models may provide greater predictive precision with sufficient within-person data. A flexible, personalized approach that selects the optimal model for each individual may enhance emotion monitoring, while leveraging text data to provide contextual insights that could inform appropriate interventions.

PubMed Disclaimer

Conflict of interest statement

Additional Declarations: Yes there is potential Competing Interest. Dr. Webb has received consulting fees from King & Spalding law firm. Over the past 3 years, Dr. Pizzagalli has received consulting fees from Arrowhead Pharmaceuticals, Boehringer Ingelheim, Compass Pathways, Engrail Therapeutics, Karla Therapeutics, Neumora Therapeutics (formerly BlackThorn Therapeutics), Neurocrine Biosciences, Neuroscience Software, Sage Therapeutics, and Takeda; he has received honoraria from the American Psychological Association, Psychonomic Society and Springer (for editorial work) and Alkermes; he has received research funding from the Bird Foundation, Brain and Behavior Research Foundation, Dana Foundation, Millennium Pharmaceuticals, NIMH, and Wellcome Leap; he has received stock options from Compass Pathways, Engrail Therapeutics, Neumora Therapeutics, and Neuroscience Software. No funding from these entities was used to support the current work, and all views expressed are solely those of the authors. The other authors declare no competing financial interests.

Figures

**Figure 1**
The relationship between predicted and observed (actual) negative affect ratings using person-specific (idiographic) models. Panel (a) Random Forest (b) Elastic Net. Each colored line represents an individual participant’s predicted estimates (y axis) across different levels of actual negative affect (x axis).

**Figure 2**
Examples of person-specific (idiographic) predictions of negative affect for high-performance (top panel) and low-performance (lower panel) models

**Figure 3**
SHAP (Shapley Additive Explanations) beeswarm plots for four individual participants, illustrating the contribution of different text-based features to the model’s emotion predictions. Each dot represents an individual data point, with red indicating high feature values and blue indicating low feature values. The spread of dots across the x-axis reflects the variability in a feature’s effect on predictions across different measurement time point. The y-axis lists the features in descending order of importance, meaning the top features had the strongest impact on the model’s predictions. The x-axis represents SHAP values, indicating the magnitude and direction of each feature’s impact on the predicted emotion scores. For example, for participant K23528, a *low* score on “Acquire” (red color) increases the model’s prediction of negative affect. Conversely, a *high*feature value for “Work” or “Big Words” (red color) contributes to a higher predicted negative affect.

**Figure 4**
The top panel displays the predictive R² for different models (Random Forest, GPT, and Elastic Net) across four emotion categories: negative affect, sadness, anger, and nervousness. The bottom panel presents the corresponding RMSE values. Each box plot represents the distribution of performance metrics for four different NLP-based feature sets: Combined (purple), GPT (yellow), LDA (orange), and LIWC+VADER (green). Higher R² indicates better predictive accuracy, while lower RMSEreflects better model fit.

**Figure 5**
The pie charts display the distribution of best-performing models for predicting negative emotions (a: Negative affect, b: Sadness, c: Anger, d: Nervousness) across three comparison categories: nomothetic vs. idiographic models, NLP approaches, and ML models. The colors represent different model types, indicating the proportion of participants for whom each model was the best predictor.

**Figure 6**
The pipeline for processing and analyzing text data from EMA responses to predict emotional states. The process begins with preprocessing (e.g., text normalization, stopword removal, stemming). Text features are then extracted using lexicon-based analysis (LIWC, VADER), transformer models (GPT-4), and topic modeling (LDA). These features, including GPT emotion ratings, linguistic categories, sentiment scores, and topics, are fed into machine learning models (Random Forest, Elastic Net) for prediction. Model performance is evaluated using R² and RMSE, while SHAP values provide insight into feature importance. This approach integrates multiple NLP techniques to enhance emotion prediction accuracy.

See this image and copyright information in PMC

References

1. Barrett L. F., Mesquita B., Ochsner K. N. & Gross J. J. The experience of emotion. Annu. Rev. Psychol. 58, 373–403 (2007). - PMC - PubMed
1. Frijda N. H. The laws of emotion. American Psychologist 43, 349–358 (1988). - PubMed
1. Fisher H. Affect dynamics in adolescent depression: Are all equilibria worth returning to? - PMC - PubMed
1. Houben M., Van Den Noortgate W. & Kuppens P. The relation between short-term emotion dynamics and psychological well-being: A meta-analysis. Psychological bulletin 141, 901–930 (2015). - PubMed
1. Houben M. & Kuppens P. Emotion dynamics and the association with depressive features and borderline personality disorder traits: Unique, specific, and prospective relationships. Clinical Psychological Science 8, 226–239 (2020).

Publication types

Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
- PubMed Central
- Research Square
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Using Natural Language Processing to Track Negative Emotions in the Daily Lives of Adolescents

Affiliations

Using Natural Language Processing to Track Negative Emotions in the Daily Lives of Adolescents

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials