Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 1;25(1):61-71.
doi: 10.1093/jamia/ocx059.

Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records

Affiliations

Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records

Cosmin A Bejan et al. J Am Med Inform Assoc. .

Abstract

Objective: Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository.

Materials and methods: We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE.

Results: word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%).

Conclusion: We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.

Keywords: EHR; adverse childhood experiences; homelessness; social determinants of health; text mining.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
System architecture for identifying SDH in the Vanderbilt EHR
Figure 2.
Figure 2.
Evaluation of methods for homelessness query expansion
Figure 3.
Figure 3.
Precision-recall curves for homelessness and ACE evaluation
Figure 4.
Figure 4.
Race and sex distributions over time for the identified homeless and ACE patients
Figure 5.
Figure 5.
Descriptive statistics of the patients found as relevant for ACE and homelessness ICD-9 code descriptions: 309.81, Posttraumatic stress disorder; 311, Depressive disorder, not elsewhere classified; 300, Anxiety, dissociative and somatoform disorders; 780.79, Other malaise and fatigue; 296.9, Other and unspecified episodic mood disorder; V60.0, Lack of housing; 305.1, Tobacco use disorder; 780.79, Other malaise and fatigue; 786.59, Other chest pain; 786.5, Chest pain.
Figure 6.
Figure 6.
Trends in homelessness status across patient visits

References

    1. Centers for Disease Control and Prevention. Tobacco-Related Mortality. 2016. https://www.cdc.gov/tobacco/data_statistics/fact_sheets/health_effects/t.... Accessed October 2, 2016.
    1. Centers for Disease Control and Prevention. Alcohol Use and Your Health. 2016. http://www.cdc.gov/alcohol/fact-sheets/alcohol-use.htm. Accessed October 2, 2016.
    1. Mokdad AH, Marks JS, Stroup DF, et al. Actual causes of death in the United States, 2000. JAMA. 2004;291:1238–45. - PubMed
    1. Holt-Lunstad J, Smith TB, Layton JB. Social relationships and mortality risk: a meta-analytic review. PLoS Med. 2010;7:e1000316. - PMC - PubMed
    1. National Academy of Medicine. Capturing Social and Behavioral Domains in Electronic Health Records: Phase 1. Washington, DC: National Academies Press; 2014. - PubMed

Publication types

MeSH terms