Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul-Aug;30(4):795-810.
doi: 10.1002/cpp.2842. Epub 2023 Feb 26.

Dynamic suicide topic modelling: Deriving population-specific, psychosocial and time-sensitive suicide risk variables from Electronic Health Record psychotherapy notes

Affiliations

Dynamic suicide topic modelling: Deriving population-specific, psychosocial and time-sensitive suicide risk variables from Electronic Health Record psychotherapy notes

Maxwell Levis et al. Clin Psychol Psychother. 2023 Jul-Aug.

Abstract

In the machine learning subfield of natural language processing, a topic model is a type of unsupervised method that is used to uncover abstract topics within a corpus of text. Dynamic topic modelling (DTM) is used for capturing change in these topics over time. The study deploys DTM on corpus of electronic health record psychotherapy notes. This retrospective study examines whether DTM helps distinguish closely matched patients that did and did not die by suicide. Cohort consists of United States Department of Veterans Affairs (VA) patients diagnosed with Posttraumatic Stress Disorder (PTSD) between 2004 and 2013. Each case (those who died by suicide during the year following diagnosis) was matched with five controls (those who remained alive) that shared psychotherapists and had similar suicide risk based on VA's suicide prediction algorithm. Cohort was restricted to patients who received psychotherapy for 9+ months after initial PTSD diagnoses (cases = 77; controls = 362). For cases, psychotherapy notes from diagnosis until death were examined. For controls, psychotherapy notes from diagnosis until matched case's death date were examined. A Python-based DTM algorithm was utilized. Derived topics identified population-specific themes, including PTSD, psychotherapy, medication, communication and relationships. Control topics changed significantly more over time than case topics. Topic differences highlighted engagement, expressivity and therapeutic alliance. This study strengthens groundwork for deriving population-specific, psychosocial and time-sensitive suicide risk variables.

Keywords: dynamic topic models; electronic medical records; natural language processing; suicide prediction.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST STATEMENT

The authors have no conflict of interest.

Figures

FIGURE 1
FIGURE 1
Quantitative method of topic mapping between derived case and control topics. We first identified the top 20 words (see Table 2 for words) in each topic from the first month of treatment and then used Word2Vec to map these words’ similarities across topics. Respective topics are numerated as follows: Topic 0 = treatment; topic 1 = medication; topic 3 = engagement; topic 4 = expressivity; topic 4 = symptomology. Thicker blue lines are indicative of increased topic similarity.
FIGURE 2
FIGURE 2
Word changes over time within cases (top row) and controls’ (bottom row) identified topics, respectively. Each topics’ selected words (the same words presented in Table 2) were derived using dynamic topic modelling. X axis = time (0 is index, 11 is suicide for cases/end of observation for controls); Y axis = prominence (lower values are more prominent).
FIGURE 3
FIGURE 3
Spearman correlations over time for cases (top row) and controls’ (bottom row) identified topics at index and end of treatment year (X axis: index month; Y axis: month of suicide for cases/end of observation for controls). Each word is presented at its associated coordinate; 95% confidence interval (CI) was calculated using Bonett and Wright’s recommended method for non-parametric data (2000). We determined that one Spearman coefficient was statistically significantly greater based on lack of CI overlap. We determined that one Spearman coefficient was statistically significantly greater based on lack of CI overlap (see Table 3) for CI.

Similar articles

Cited by

References

    1. Alloghani M, Al-Jumeily D, Mustafina J, Hussain A, & Aljaaf AJ (2020). A systematic review on supervised and unsupervised machine learning algorithms for data science. In Berry MW, Mohamed A, & Yap BW (Eds.), Supervised and unsupervised learning for data science (pp. 3–21). Springer International Publishing. 10.1007/978-3-030-22475-2_1 - DOI
    1. AlSumait L, Barbará D, Gentle J, & Domeniconi C. (2009). Topic significance ranking of LDA generative models. In Buntine W, Grobelnik M, Mladenic D, & Shawe-Taylor J. (Eds.), Machine learning and knowledge discovery in databases (Vol. 5781) (pp. 67–82). Springer. 10.1007/978-3-642-04180-8_22 - DOI
    1. Andrade C. (2020). Mean difference, standardized mean difference (SMD), and their use in meta-analysis: As simple as it gets. The Journal of Clinical Psychiatry, 81(5). 10.4088/JCP.20f13681 - DOI - PubMed
    1. Atzil-Slonim D, Juravski D, Bar-Kalifa E, Gilboa-Schechtman E, Tuval-Mashiach R, Shapira N, & Goldberg Y. (2021). Using topic models to identify clients’ functioning levels and alliance ruptures in psychotherapy. Psychotherapy, 58(2), 324–339. 10.1037/pst0000362 - DOI - PubMed
    1. Austin PC (2011). Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharmaceutical Statistics, 10(2), 150–161. 10.1002/pst.433 - DOI - PMC - PubMed