Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr;54(2):690-711.
doi: 10.3758/s13428-021-01623-4. Epub 2021 Aug 3.

Automated evaluation of psychotherapy skills using speech and language technologies

Affiliations

Automated evaluation of psychotherapy skills using speech and language technologies

Nikolaos Flemotomos et al. Behav Res Methods. 2022 Apr.

Abstract

With the growing prevalence of psychological interventions, it is vital to have measures which rate the effectiveness of psychological care to assist in training, supervision, and quality assurance of services. Traditionally, quality assessment is addressed by human raters who evaluate recorded sessions along specific dimensions, often codified through constructs relevant to the approach and domain. This is, however, a cost-prohibitive and time-consuming method that leads to poor feasibility and limited use in real-world settings. To facilitate this process, we have developed an automated competency rating tool able to process the raw recorded audio of a session, analyzing who spoke when, what they said, and how the health professional used language to provide therapy. Focusing on a use case of a specific type of psychotherapy called "motivational interviewing", our system gives comprehensive feedback to the therapist, including information about the dynamics of the session (e.g., therapist's vs. client's talking time), low-level psychological language descriptors (e.g., type of questions asked), as well as other high-level behavioral constructs (e.g., the extent to which the therapist understands the clients' perspective). We describe our platform and its performance using a dataset of more than 5000 recordings drawn from its deployment in a real-world clinical setting used to assist training of new therapists. Widespread use of automated psychotherapy rating tools may augment experts' capabilities by providing an avenue for more effective training and skill improvement, eventually leading to more positive clinical outcomes.

Keywords: MISC; Machine learning; Motivational interviewing; Psychotherapy; Quality assessment; Speech processing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(a) Overview of the system used to assess the quality of a psychotherapy session and provide feedback to the therapist. Once the audio is recorded, it is automatically transcribed to find who spoke when and what they said. If the transcription meets certain quality criteria, this textual information is used to predict utterance-level and session-level behavior codes which are summarized into an interactive feedback report. Otherwise, an error message is displayed to the user. (b) Rich transcription module. The dyadic interaction is transcribed through a pipeline that extracts the linguistic information encoded in the speech signal and assigns each speaker turn to either the therapist or the client.
Figure 2.
Figure 2.
Count of each target MISC label per session (Table 5) when coded by humans (reference) and when processed by the pipeline. All the sessions in the two test sets of the University Counseling Center (UCC) dataset (UCCtest1 and UCCtest2) are shown and the correlation values are calculated based on all of them. The sessions flagged as problematic by the quality safeguards are denoted by square markers. RE is a composite label containing both simple and complex reflections (RES and REC).
Figure 3.
Figure 3.
Frequency of the utterance-level MISC codes (Table 5) for all the University Counseling Center (UCC) recordings processed and for the subset included in the UCC test sets. Only the sessions successfully processed (that met our quality criteria) are taken into consideration here. The total number of therapist-assigned utterances is about 1.2M for all the sessions (4,269 sessions) and 28K for only the sessions included in the UCC test sets (UCCtest1 and UCCtest2; 96 sessions).
Figure 4.
Figure 4.
Distribution of the session-level MISC codes (Table 1) for all the University Counseling Center (UCC) recordings processed and for the subset included in the UCC test sets. Only the sessions successfully processed (that met our quality criteria) are taken into consideration here.

References

    1. Anguera X, Bozonnet S, Evans N, Fredouille C, Friedland G, & Vinyals O (2012). Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20 (2), 356–370.
    1. Anguera X, Wooters C, & Hernando J (2007). Acoustic beamforming for speaker diarization of meetings. IEEE Transactions on Audio, Speech, and Language Processing, 15 (7), 2011–2022.
    1. Baer JS, Wells EA, Rosengren DB, Hartzler B, Beadnell B, & Dunn C (2009). Agency context and tailored training in technology transfer: A pilot evaluation of motivational interviewing training for community counselors. Journal of substance abuse treatment, 37 (2), 191–202. - PMC - PubMed
    1. Bakeman R, & Quera V (2012). Behavioral observation. In Cooper H, Camic PM, Long DL, Panter AT, Rindskopf D, & Sher KJ (Eds.), Apa handbook of research methods in psychology, vol. 1. foundations, planning, measures, and psychometrics (pp. 207–225). Washington, DC: American Psychological Association. doi: 10.1037/13619-013 - DOI
    1. Barahona LMR, Tseng B-H, Dai Y, Mansfield C, Ramadan O, Ultes S, … Gasic M (2018). Deep learning for language understanding of mental health concepts derived from cognitive behavioural therapy. In Proc. international workshop on health text mining and information analysis (pp. 44–54).

Publication types

LinkOut - more resources