Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records
- PMID: 29016793
- PMCID: PMC6080810
- DOI: 10.1093/jamia/ocx059
Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records
Abstract
Objective: Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository.
Materials and methods: We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE.
Results: word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being "father" (21.8%) and "mother" (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%-47.6%).
Conclusion: We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.
Keywords: EHR; adverse childhood experiences; homelessness; social determinants of health; text mining.
© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Figures
References
-
- Centers for Disease Control and Prevention. Tobacco-Related Mortality. 2016. https://www.cdc.gov/tobacco/data_statistics/fact_sheets/health_effects/t.... Accessed October 2, 2016.
-
- Centers for Disease Control and Prevention. Alcohol Use and Your Health. 2016. http://www.cdc.gov/alcohol/fact-sheets/alcohol-use.htm. Accessed October 2, 2016.
-
- Mokdad AH, Marks JS, Stroup DF, et al. Actual causes of death in the United States, 2000. JAMA. 2004;291:1238–45. - PubMed
-
- National Academy of Medicine. Capturing Social and Behavioral Domains in Electronic Health Records: Phase 1. Washington, DC: National Academies Press; 2014. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Miscellaneous
