Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts
- PMID: 33758800
- PMCID: PMC7966858
- DOI: 10.1093/jamiaopen/ooab011
Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts
Abstract
Objective: Limited research exists in predicting first-time suicide attempts that account for two-thirds of suicide decedents. We aimed to predict first-time suicide attempts using a large data-driven approach that applies natural language processing (NLP) and machine learning (ML) to unstructured (narrative) clinical notes and structured electronic health record (EHR) data.
Methods: This case-control study included patients aged 10-75 years who were seen between 2007 and 2016 from emergency departments and inpatient units. Cases were first-time suicide attempts from coded diagnosis; controls were randomly selected without suicide attempts regardless of demographics, following a ratio of nine controls per case. Four data-driven ML models were evaluated using 2-year historical EHR data prior to suicide attempt or control index visits, with prediction windows from 7 to 730 days. Patients without any historical notes were excluded. Model evaluation on accuracy and robustness was performed on a blind dataset (30% cohort).
Results: The study cohort included 45 238 patients (5099 cases, 40 139 controls) comprising 54 651 variables from 5.7 million structured records and 798 665 notes. Using both unstructured and structured data resulted in significantly greater accuracy compared to structured data alone (area-under-the-curve [AUC]: 0.932 vs. 0.901 P < .001). The best-predicting model utilized 1726 variables with AUC = 0.932 (95% CI, 0.922-0.941). The model was robust across multiple prediction windows and subgroups by demographics, points of historical most recent clinical contact, and depression diagnosis history.
Conclusions: Our large data-driven approach using both structured and unstructured EHR data demonstrated accurate and robust first-time suicide attempt prediction, and has the potential to be deployed across various populations and clinical settings.
Keywords: electronic health records; machine learning; natural language processing; suicide attempt.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Figures






Similar articles
-
Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.Front Digit Health. 2022 Aug 2;4:945006. doi: 10.3389/fdgth.2022.945006. eCollection 2022. Front Digit Health. 2022. PMID: 35983407 Free PMC article.
-
Leveraging Natural Language Processing to Improve Electronic Health Record Suicide Risk Prediction for Veterans Health Administration Users.J Clin Psychiatry. 2023 Jun 19;84(4):22m14568. doi: 10.4088/JCP.22m14568. J Clin Psychiatry. 2023. PMID: 37341477 Free PMC article.
-
Leveraging unstructured electronic medical record notes to derive population-specific suicide risk models.Psychiatry Res. 2022 Sep;315:114703. doi: 10.1016/j.psychres.2022.114703. Epub 2022 Jul 1. Psychiatry Res. 2022. PMID: 35841702
-
Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models.Psychol Med. 2021 Jun;51(8):1382-1391. doi: 10.1017/S0033291720000173. Epub 2020 Feb 17. Psychol Med. 2021. PMID: 32063248 Free PMC article.
-
Reviewing a Decade of Research Into Suicide and Related Behaviour Using the South London and Maudsley NHS Foundation Trust Clinical Record Interactive Search (CRIS) System.Front Psychiatry. 2020 Nov 27;11:553463. doi: 10.3389/fpsyt.2020.553463. eCollection 2020. Front Psychiatry. 2020. PMID: 33329090 Free PMC article. Review.
Cited by
-
Application of machine learning and natural language processing for predicting stroke-associated pneumonia.Front Public Health. 2022 Sep 29;10:1009164. doi: 10.3389/fpubh.2022.1009164. eCollection 2022. Front Public Health. 2022. PMID: 36249261 Free PMC article.
-
Complex modeling with detailed temporal predictors does not improve health records-based suicide risk prediction.NPJ Digit Med. 2023 Mar 23;6(1):47. doi: 10.1038/s41746-023-00772-4. NPJ Digit Med. 2023. PMID: 36959268 Free PMC article.
-
Detection of Suicidal Behavior and Self-harm Among Children Presenting to Emergency Departments: A Tree-based Classification Approach.AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:108-117. eCollection 2023. AMIA Jt Summits Transl Sci Proc. 2023. PMID: 37350874 Free PMC article.
-
Predictive structured-unstructured interactions in EHR models: A case study of suicide prediction.NPJ Digit Med. 2022 Jan 27;5(1):15. doi: 10.1038/s41746-022-00558-0. NPJ Digit Med. 2022. PMID: 35087182 Free PMC article.
-
Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.Front Digit Health. 2022 Aug 2;4:945006. doi: 10.3389/fdgth.2022.945006. eCollection 2022. Front Digit Health. 2022. PMID: 35983407 Free PMC article.
References
-
- Appleby L, Hunt IM, Kapur N.. New policy and evidence on suicide prevention. Lancet Psychiatry 2017; 4 (9): 658–60. - PubMed
-
- Ross V, Kõlves K, De Leo D.. Beyond psychopathology: a case-control psychological autopsy study of young adult males. Int J Soc Psychiatry 2017; 63 (2): 151–60. - PubMed
-
- Kodaka M, Matsumoto T, Yamauchi T, Takai M, Shirakawa N, Takeshima T.. Female suicides: Psychosocial and psychiatric characteristics identified by a psychological autopsy study in Japan. Psychiatry Clin Neurosci 2017; 71 (4): 271–9. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical