Dynamic suicide topic modelling: Deriving population-specific, psychosocial and time-sensitive suicide risk variables from Electronic Health Record psychotherapy notes

Maxwell Levis^{1

2}, Joshua Levy², Vincent Dufort¹, Carey J Russ^{1

2}, Brian Shiner^{1

2

3}

Affiliations

¹ White River Junction VA Medical Center, Hartford, Vermont, USA.
² Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA.
³ National Center for PTSD Executive Division, Hartford, Vermont, USA.

PMID: 36797651
PMCID: PMC11172400
DOI: 10.1002/cpp.2842

Dynamic suicide topic modelling: Deriving population-specific, psychosocial and time-sensitive suicide risk variables from Electronic Health Record psychotherapy notes

Maxwell Levis et al. Clin Psychol Psychother. 2023 Jul-Aug.

. 2023 Jul-Aug;30(4):795-810.

doi: 10.1002/cpp.2842. Epub 2023 Feb 26.

Authors

Maxwell Levis^{1

2}, Joshua Levy², Vincent Dufort¹, Carey J Russ^{1

2}, Brian Shiner^{1

2

3}

Affiliations

¹ White River Junction VA Medical Center, Hartford, Vermont, USA.
² Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA.
³ National Center for PTSD Executive Division, Hartford, Vermont, USA.

PMID: 36797651
PMCID: PMC11172400
DOI: 10.1002/cpp.2842

Abstract

In the machine learning subfield of natural language processing, a topic model is a type of unsupervised method that is used to uncover abstract topics within a corpus of text. Dynamic topic modelling (DTM) is used for capturing change in these topics over time. The study deploys DTM on corpus of electronic health record psychotherapy notes. This retrospective study examines whether DTM helps distinguish closely matched patients that did and did not die by suicide. Cohort consists of United States Department of Veterans Affairs (VA) patients diagnosed with Posttraumatic Stress Disorder (PTSD) between 2004 and 2013. Each case (those who died by suicide during the year following diagnosis) was matched with five controls (those who remained alive) that shared psychotherapists and had similar suicide risk based on VA's suicide prediction algorithm. Cohort was restricted to patients who received psychotherapy for 9+ months after initial PTSD diagnoses (cases = 77; controls = 362). For cases, psychotherapy notes from diagnosis until death were examined. For controls, psychotherapy notes from diagnosis until matched case's death date were examined. A Python-based DTM algorithm was utilized. Derived topics identified population-specific themes, including PTSD, psychotherapy, medication, communication and relationships. Control topics changed significantly more over time than case topics. Topic differences highlighted engagement, expressivity and therapeutic alliance. This study strengthens groundwork for deriving population-specific, psychosocial and time-sensitive suicide risk variables.

Keywords: dynamic topic models; electronic medical records; natural language processing; suicide prediction.

Published 2023. This article is a U.S. Government work and is in the public domain in the USA.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST STATEMENT

The authors have no conflict of interest.

Figures

**FIGURE 1**
Quantitative method of topic mapping between derived case and control topics. We first identified the top 20 words (see Table 2 for words) in each topic from the first month of treatment and then used Word2Vec to map these words’ similarities across topics. Respective topics are numerated as follows: Topic 0 = treatment; topic 1 = medication; topic 3 = engagement; topic 4 = expressivity; topic 4 = symptomology. Thicker blue lines are indicative of increased topic similarity.

**FIGURE 2**
Word changes over time within cases (top row) and controls’ (bottom row) identified topics, respectively. Each topics’ selected words (the same words presented in Table 2) were derived using dynamic topic modelling. X axis = time (0 is index, 11 is suicide for cases/end of observation for controls); Y axis = prominence (lower values are more prominent).

**FIGURE 3**
Spearman correlations over time for cases (top row) and controls’ (bottom row) identified topics at index and end of treatment year (X axis: index month; Y axis: month of suicide for cases/end of observation for controls). Each word is presented at its associated coordinate; 95% confidence interval (CI) was calculated using Bonett and Wright’s recommended method for non-parametric data (2000). We determined that one Spearman coefficient was statistically significantly greater based on lack of CI overlap. We determined that one Spearman coefficient was statistically significantly greater based on lack of CI overlap (see Table 3) for CI.

See this image and copyright information in PMC

References

1. Alloghani M, Al-Jumeily D, Mustafina J, Hussain A, & Aljaaf AJ (2020). A systematic review on supervised and unsupervised machine learning algorithms for data science. In Berry MW, Mohamed A, & Yap BW (Eds.), Supervised and unsupervised learning for data science (pp. 3–21). Springer International Publishing. 10.1007/978-3-030-22475-2_1 - DOI
1. AlSumait L, Barbará D, Gentle J, & Domeniconi C. (2009). Topic significance ranking of LDA generative models. In Buntine W, Grobelnik M, Mladenic D, & Shawe-Taylor J. (Eds.), Machine learning and knowledge discovery in databases (Vol. 5781) (pp. 67–82). Springer. 10.1007/978-3-642-04180-8_22 - DOI
1. Andrade C. (2020). Mean difference, standardized mean difference (SMD), and their use in meta-analysis: As simple as it gets. The Journal of Clinical Psychiatry, 81(5). 10.4088/JCP.20f13681 - DOI - PubMed
1. Atzil-Slonim D, Juravski D, Bar-Kalifa E, Gilboa-Schechtman E, Tuval-Mashiach R, Shapira N, & Goldberg Y. (2021). Using topic models to identify clients’ functioning levels and alliance ruptures in psychotherapy. Psychotherapy, 58(2), 324–339. 10.1037/pst0000362 - DOI - PubMed
1. Austin PC (2011). Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharmaceutical Statistics, 10(2), 150–161. 10.1002/pst.433 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Dynamic suicide topic modelling: Deriving population-specific, psychosocial and time-sensitive suicide risk variables from Electronic Health Record psychotherapy notes

Affiliations

Dynamic suicide topic modelling: Deriving population-specific, psychosocial and time-sensitive suicide risk variables from Electronic Health Record psychotherapy notes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical