Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 17;4(1):ooab011.
doi: 10.1093/jamiaopen/ooab011. eCollection 2021 Jan.

Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts

Affiliations

Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts

Fuchiang R Tsui et al. JAMIA Open. .

Abstract

Objective: Limited research exists in predicting first-time suicide attempts that account for two-thirds of suicide decedents. We aimed to predict first-time suicide attempts using a large data-driven approach that applies natural language processing (NLP) and machine learning (ML) to unstructured (narrative) clinical notes and structured electronic health record (EHR) data.

Methods: This case-control study included patients aged 10-75 years who were seen between 2007 and 2016 from emergency departments and inpatient units. Cases were first-time suicide attempts from coded diagnosis; controls were randomly selected without suicide attempts regardless of demographics, following a ratio of nine controls per case. Four data-driven ML models were evaluated using 2-year historical EHR data prior to suicide attempt or control index visits, with prediction windows from 7 to 730 days. Patients without any historical notes were excluded. Model evaluation on accuracy and robustness was performed on a blind dataset (30% cohort).

Results: The study cohort included 45 238 patients (5099 cases, 40 139 controls) comprising 54 651 variables from 5.7 million structured records and 798 665 notes. Using both unstructured and structured data resulted in significantly greater accuracy compared to structured data alone (area-under-the-curve [AUC]: 0.932 vs. 0.901 P < .001). The best-predicting model utilized 1726 variables with AUC = 0.932 (95% CI, 0.922-0.941). The model was robust across multiple prediction windows and subgroups by demographics, points of historical most recent clinical contact, and depression diagnosis history.

Conclusions: Our large data-driven approach using both structured and unstructured EHR data demonstrated accurate and robust first-time suicide attempt prediction, and has the potential to be deployed across various populations and clinical settings.

Keywords: electronic health records; machine learning; natural language processing; suicide attempt.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The diagram of cohort identification process. From all inpatient or emergency department visits between 2007 and 2016, our initial cohort comprised 8588 suicide attempt patients based on diagnoses and randomly selected 77 292 patients without any suicide attempt diagnoses. After applying the exclusion criteria, we had a final cohort with 5099 case patients and 40 139 control patients. The cohort was further divided into training and test datasets for model building and testing, respectively. Abbreviation: UPMC, University of Pittsburgh Medical Center.
Figure 2.
Figure 2.
The temporal diagram showing historical electronical health record (EHR) data up to 2 years prior to an index visit at an emergency department or an inpatient facility. A case index visit represents a first-time suicide attempt visit and a control index visit represents a randomly selected visit from controls with longitudinal EHR data. A first-time suicide attempt visit (Vt0) is defined as first known suicide attempt visit between 2005 and 2016; Vt0: the index visit; Vt-1: last point of clinical contact or last clinical encounter prior to the index visit (Vt0). A prediction window is defined as the time interval between the index visit time (Vt0) and the historical most recent clinical-visit time (Vt-1) prior to the index visit.
Figure 3.
Figure 3.
The process flow of a medical natural language processing (NLP) pipeline, which transforms a narrative sentence in a clinical note to structured outcomes. For example, the sentence has three symptoms (fever, cough, and vomiting) and vomiting concept is negated. Negated concepts are common in clinical notes.
Figure 4.
Figure 4.
Receiver Operating Characteristic (ROC) curves of four ML models. Plots A and B show ROCs in 30- and 730-day prediction windows, respectively. Abbreviations: EXGB, Ensemble of eXtreme Gradient Boosting; LASSO, Least Absolute Shrinkage and Selection Operator.
Figure 5.
Figure 5.
Plots of predictive model accuracy, measured by the area under a receiver operating characteristic curve (AUC), among 4 predictive models. Plot (A) shows model performance in 30-day prediction window. Plot (B) shows model performance in 730-day prediction window. Abbreviations: EXGB, Ensemble of eXtreme Gradient Boosting; LASSO, Least Absolute Shrinkage and Selection Operator.
Figure 6.
Figure 6.
Robustness analysis of Ensemble eXtreme Gradient Boosting (EXGB) model across 18 subgroups based on demographics (age, race, gender, insurance), depression diagnosis, and point of historical most recent clinical contact. Plot (A) shows the EXGB performance in 30-day prediction window. Plot (B) shows the EXGB performance in 730-day prediction window. Age was measured in years. Abbreviations: T, present; F, absent; LastContact, point of historical most recent clinical contact.

Similar articles

Cited by

References

    1. Mack KA, Clapperton AJ, Macpherson A, et al.Trends in the leading causes of injury mortality, Australia, Canada and the United States, 2000-2014. Can J Public Health 2017; 108 (2): e185–e191. - PMC - PubMed
    1. Case A, Deaton A.. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century. Proc Natl Acad Sci USA 2015; 112 (49): 15078–83. - PMC - PubMed
    1. Appleby L, Hunt IM, Kapur N.. New policy and evidence on suicide prevention. Lancet Psychiatry 2017; 4 (9): 658–60. - PubMed
    1. Ross V, Kõlves K, De Leo D.. Beyond psychopathology: a case-control psychological autopsy study of young adult males. Int J Soc Psychiatry 2017; 63 (2): 151–60. - PubMed
    1. Kodaka M, Matsumoto T, Yamauchi T, Takai M, Shirakawa N, Takeshima T.. Female suicides: Psychosocial and psychiatric characteristics identified by a psychological autopsy study in Japan. Psychiatry Clin Neurosci 2017; 71 (4): 271–9. - PubMed

LinkOut - more resources