Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation
- PMID: 39631489
- DOI: 10.1016/j.jbi.2024.104755
Enhancing suicidal behavior detection in EHRs: A multi-label NLP framework with transformer models and semantic retrieval-based annotation
Abstract
Background: Suicide is a leading cause of death worldwide, making early identification of suicidal behaviors crucial for clinicians. Current Natural Language Processing (NLP) approaches for identifying suicidal behaviors in Electronic Health Records (EHRs) rely on keyword searches, rule-based methods, and binary classification, which may not fully capture the complexity and spectrum of suicidal behaviors. This study aims to create a multi-class labeled dataset with annotation guidelines and develop a novel NLP approach for fine-grained, multi-label classification of suicidal behaviors, improving the efficiency of the annotation process and accuracy of the NLP methods.
Methods: We develop a multi-class labeling system based on guidelines from FDA, CDC, and WHO, distinguishing between six categories of suicidal behaviors and allowing for multiple labels per data sample. To efficiently create an annotated dataset, we use an MPNet-based semantic retrieval framework to extract relevant sentences from a large EHR dataset, reducing annotation space while capturing diverse expressions. Experts annotate the extracted sentences using the multi-class system. We then formulate the task as a multi-label classification problem and fine-tune transformer-based models on the curated dataset to accurately classify suicidal behaviors in EHRs.
Results: Lexical analysis revealed key themes in assessing suicide risk, considering an individual's history, mental health, substance use, and family background. Fine-tuned transformer-based models effectively identified suicidal behaviors from EHRs, with Bio_ClinicalBERT, BioBERT, and XLNet achieving the F1 scores (0.81), outperforming BERT and RoBERTa. The proposed approach, based on a multi-label classification system, captures the complexity of suicidal behaviors effectively particularly "Suicide Attempt" and "Family History" instances. The proposed approach, using task-specific NLP models and a multi-label classification system, captures the complexity of suicidal behaviors more effectively than traditional binary classification. However, direct comparisons with existing studies are difficult due to varying metrics and label definitions.
Conclusion: This study presents a robust NLP framework for detecting suicidal behaviors in EHRs, leveraging task-specific fine-tuning of transformer-based models and a semi-automated pipeline. Despite limitations, the approach demonstrates the potential of advanced NLP techniques in enhancing the identification of suicidal behaviors. Future work should focus on model expansion and integration to further improve patient care and clinical decision-making.
Keywords: Deep learning; Electronic Health Records (EHRs); Mental health informatics; Multi-Label classification; Natural Language Processing (NLP); Suicidal behaviors; Transformer-based language models.
Copyright © 2024 Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of competing interest All authors declare that they have no conflicts of interest.
Similar articles
-
Detection of Personal and Family History of Suicidal Thoughts and Behaviors using Deep Learning and Natural Language Processing: A Multi-Site Study.Res Sq [Preprint]. 2024 Mar 11:rs.3.rs-4014472. doi: 10.21203/rs.3.rs-4014472/v1. Res Sq. 2024. Update in: NPJ Digit Med. 2024 Sep 28;7(1):260. doi: 10.1038/s41746-024-01266-7. PMID: 38559051 Free PMC article. Updated. Preprint.
-
Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning.JMIR Med Inform. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508. JMIR Med Inform. 2020. PMID: 33245284 Free PMC article.
-
Natural language processing to identify suicidal ideation and anhedonia in major depressive disorder.BMC Med Inform Decis Mak. 2025 Jan 13;25(1):20. doi: 10.1186/s12911-025-02851-w. BMC Med Inform Decis Mak. 2025. PMID: 39806393 Free PMC article.
-
NLP for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review.J Pain Symptom Manage. 2025 May;69(5):e374-e394. doi: 10.1016/j.jpainsymman.2025.01.019. Epub 2025 Jan 31. J Pain Symptom Manage. 2025. PMID: 39894080 Review.
-
Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173. J Am Med Inform Assoc. 2019. PMID: 30726935 Free PMC article.
Cited by
-
Multi-Label Classification with Generative AI Models in Healthcare: A Case Study of Suicidality and Risk Factors.ArXiv [Preprint]. 2025 Jul 22:arXiv:2507.17009v1. ArXiv. 2025. PMID: 40740509 Free PMC article. Preprint.
-
Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models.AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:260-269. eCollection 2025. AMIA Jt Summits Transl Sci Proc. 2025. PMID: 40502237 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical