Review

. 2024 Dec;225(6):532-537.

doi: 10.1192/bjp.2024.134.

Detection of suicidality from medical text using privacy-preserving large language models

Isabella Catharina Wiest¹, Falk Gerrik Verhees², Dyke Ferber³, Jiefu Zhu⁴, Michael Bauer², Ute Lewitzka², Andrea Pfennig², Pavol Mikolas², Jakob Nikolas Kather⁵

Affiliations

¹ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; and Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
² Department of Psychiatry and Psychotherapy, Carl Gustav Carus University Hospital, Technical University Dresden, Dresden, Germany.
³ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany; and Department of Medical Oncology, Heidelberg University Hospital, Heidelberg, Germany.
⁴ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
⁵ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany; Department of Medical Oncology, Heidelberg University Hospital, Heidelberg, Germany; and Department of Medicine I, University Hospital Dresden, Dresden, Germany.

PMID: 39497458
PMCID: PMC11669470
DOI: 10.1192/bjp.2024.134

Review

Detection of suicidality from medical text using privacy-preserving large language models

Isabella Catharina Wiest et al. Br J Psychiatry. 2024 Dec.

. 2024 Dec;225(6):532-537.

doi: 10.1192/bjp.2024.134.

Authors

Isabella Catharina Wiest¹, Falk Gerrik Verhees², Dyke Ferber³, Jiefu Zhu⁴, Michael Bauer², Ute Lewitzka², Andrea Pfennig², Pavol Mikolas², Jakob Nikolas Kather⁵

Affiliations

¹ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; and Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
² Department of Psychiatry and Psychotherapy, Carl Gustav Carus University Hospital, Technical University Dresden, Dresden, Germany.
³ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany; and Department of Medical Oncology, Heidelberg University Hospital, Heidelberg, Germany.
⁴ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
⁵ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany; National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany; Department of Medical Oncology, Heidelberg University Hospital, Heidelberg, Germany; and Department of Medicine I, University Hospital Dresden, Dresden, Germany.

PMID: 39497458
PMCID: PMC11669470
DOI: 10.1192/bjp.2024.134

Abstract

Background: Attempts to use artificial intelligence (AI) in psychiatric disorders show moderate success, highlighting the potential of incorporating information from clinical assessments to improve the models. This study focuses on using large language models (LLMs) to detect suicide risk from medical text in psychiatric care.

Aims: To extract information about suicidality status from the admission notes in electronic health records (EHRs) using privacy-sensitive, locally hosted LLMs, specifically evaluating the efficacy of Llama-2 models.

Method: We compared the performance of several variants of the open source LLM Llama-2 in extracting suicidality status from 100 psychiatric reports against a ground truth defined by human experts, assessing accuracy, sensitivity, specificity and F1 score across different prompting strategies.

Results: A German fine-tuned Llama-2 model showed the highest accuracy (87.5%), sensitivity (83.0%) and specificity (91.8%) in identifying suicidality, with significant improvements in sensitivity and specificity across various prompt designs.

Conclusions: The study demonstrates the capability of LLMs, particularly Llama-2, in accurately extracting information on suicidality from psychiatric records while preserving data privacy. This suggests their application in surveillance systems for psychiatric emergencies and improving the clinical management of suicidality by improving systematic quality control and research.

Keywords: Large language models; electronic health records; natural language processing; psychiatric disorder detection; suicidality.

PubMed Disclaimer

Conflict of interest statement

J.N.K. declares consulting services for Owkin, France, DoMore Diagnostics, Norway, Panakeia, UK, Scailyte, Switzerland, Cancilico, Germany, Mindpeak, Germany, MultiplexDx, Slovakia, and Histofy, UK; furthermore he holds shares in StratifAI GmbH, Germany, has received a research grant by GSK, and has received honoraria by AstraZeneca, Bayer, Eisai, Janssen, MSD, BMS, Roche, Pfizer and Fresenius. I.C.W. received honoraria from AstraZeneca. U.L. participated in advisory boards and received honoraria by Janssen Cilag GmbH.

Figures

**Fig. 1**
Experimental Setup. (a) The information extraction pipeline. The psychiatry reports (n = 100) were transferred to a csv table. Our pipeline then iterates over all reports with the predefined prompt and outputs a JavaScript Object Notation-File (JSON) file with all Large Language Model (LLM) outputs (PRED). The relevant classes (suicidality present: yes or no) were then extracted from the LLM output, which was more verbose in some cases. These outputs were then transferred to a pandas dataframe and automatically compared to the expert-based ground truth (GT). (b) The initial prompting strategy. One prompt and one report were given to the model at the same time. Every prompt contained a system prompt with general instructions and a specific question to the report (Instruction). (c) The chain-of-thought approach: the psychiatry report with our prompt was fed into the LLM, which generated a first output. With a second prompt and a predefined answering grammar, the model was fed its own output and again forced to generate a certain, json based output structure. This final output then underwent performance analysis. Icon Source: Midjourney.

**Fig. 2**
Performance of German-language fine-tuned Llama-2 model. (a) Sensitivity and Specificity for five different prompting strategies. With P0, the model was simply asked to provide the answer if suicidality was present from the report, P1, P2 and P3 provided one, two or three examples to the model. P4 applied a chain-of-thought approach, where the model was asked twice, with the first model output as input for the second run. (b) Confusion matrix representing the performance of the Large Language Model (LLM) indicating the presence of suicidality based on the examined admission notes (n = 100) with a sensitivity of 83% as well as specificity of 92% for P3, a prompt that included three examples. (c) Bar chart showing the balanced accuracies for all models and prompt engineering attempts. Error bars show the 95% confidence interval of the bootstrapped samples.

See this image and copyright information in PMC

References

1. Winter NR, Blanke J, Leenings R, Ernsting J, Fisch L, Sarink K, et al. A systematic evaluation of machine learning–based biomarkers for major depressive disorder. JAMA Psychiatry 2024; 81: 386–95. - PMC - PubMed
1. Koutsouleris N, Dwyer DB, Degenhardt F, Maj C, Urquijo-Castro MF, Sanfelici R, et al. Multimodal machine learning workflows for prediction of psychosis in patients with clinical high-risk syndromes and recent-onset depression. JAMA Psychiatry 2021; 78: 195–209. - PMC - PubMed
1. Clusmann J, Kolbinger FR, Muti HS, Carrero ZI, Eckardt JN, Laleh NG, et al. The future landscape of large language models in medicine. Commun Med 2023; 3(1): 141. - PMC - PubMed
1. Wiest IC, Ferber D, Zhu J, van Treeck M, Meyer SK, Juglan R, et al. Privacy-preserving large language models for structured medical information retrieval. NPJ Digit Med 2024; 7(1): 257. - PMC - PubMed
1. Irving J, Patel R, Oliver D, Colling C, Pritchard M, Broadbent M, et al. Using natural language processing on electronic health records to enhance detection and prediction of psychosis risk. Schizophr Bull 2021; 47: 405–14. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

101096312/HORIZON EUROPE Excellent Science

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Detection of suicidality from medical text using privacy-preserving large language models

Affiliations

Detection of suicidality from medical text using privacy-preserving large language models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical