Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation

Kimia Zandbiglari¹, Shobhan Kumar¹, Muhammad Bilal¹, Amie Goodin¹, Masoud Rouhizadeh²

Affiliations

¹ Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA.
² Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Division of Biomedical Informatics & Data Science, Johns Hopkins University School of Medicine, Baltimore, MD, USA. Electronic address: mrouhizadeh@ufl.edu.

PMID: 39631489
DOI: 10.1016/j.jbi.2024.104755

Free article

Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation

Kimia Zandbiglari et al. J Biomed Inform. 2025 Jan.

Free article

. 2025 Jan:161:104755.

doi: 10.1016/j.jbi.2024.104755. Epub 2024 Dec 2.

Authors

Kimia Zandbiglari¹, Shobhan Kumar¹, Muhammad Bilal¹, Amie Goodin¹, Masoud Rouhizadeh²

Affiliations

¹ Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA.
² Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Division of Biomedical Informatics & Data Science, Johns Hopkins University School of Medicine, Baltimore, MD, USA. Electronic address: mrouhizadeh@ufl.edu.

PMID: 39631489
DOI: 10.1016/j.jbi.2024.104755

Abstract

Background: Suicide is a leading cause of death worldwide, making early identification of suicidal behaviors crucial for clinicians. Current Natural Language Processing (NLP) approaches for identifying suicidal behaviors in Electronic Health Records (EHRs) rely on keyword searches, rule-based methods, and binary classification, which may not fully capture the complexity and spectrum of suicidal behaviors. This study aims to create a multi-class labeled dataset with annotation guidelines and develop a novel NLP approach for fine-grained, multi-label classification of suicidal behaviors, improving the efficiency of the annotation process and accuracy of the NLP methods.

Methods: We develop a multi-class labeling system based on guidelines from FDA, CDC, and WHO, distinguishing between six categories of suicidal behaviors and allowing for multiple labels per data sample. To efficiently create an annotated dataset, we use an MPNet-based semantic retrieval framework to extract relevant sentences from a large EHR dataset, reducing annotation space while capturing diverse expressions. Experts annotate the extracted sentences using the multi-class system. We then formulate the task as a multi-label classification problem and fine-tune transformer-based models on the curated dataset to accurately classify suicidal behaviors in EHRs.

Results: Lexical analysis revealed key themes in assessing suicide risk, considering an individual's history, mental health, substance use, and family background. Fine-tuned transformer-based models effectively identified suicidal behaviors from EHRs, with Bio_ClinicalBERT, BioBERT, and XLNet achieving the F1 scores (0.81), outperforming BERT and RoBERTa. The proposed approach, based on a multi-label classification system, captures the complexity of suicidal behaviors effectively particularly "Suicide Attempt" and "Family History" instances. The proposed approach, using task-specific NLP models and a multi-label classification system, captures the complexity of suicidal behaviors more effectively than traditional binary classification. However, direct comparisons with existing studies are difficult due to varying metrics and label definitions.

Conclusion: This study presents a robust NLP framework for detecting suicidal behaviors in EHRs, leveraging task-specific fine-tuning of transformer-based models and a semi-automated pipeline. Despite limitations, the approach demonstrates the potential of advanced NLP techniques in enhancing the identification of suicidal behaviors. Future work should focus on model expansion and integration to further improve patient care and clinical decision-making.

Keywords: Deep learning; Electronic Health Records (EHRs); Mental health informatics; Multi-Label classification; Natural Language Processing (NLP); Suicidal behaviors; Transformer-based language models.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest All authors declare that they have no conflicts of interest.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation

Affiliations

Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation

Authors

Affiliations

Abstract

Conflict of interest statement

MeSH terms

LinkOut - more resources

Full Text Sources

Medical