Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 19;30(8):1448-1455.
doi: 10.1093/jamia/ocad071.

Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches

Affiliations

Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches

Brian Romanowski et al. J Am Med Inform Assoc. .

Abstract

Objective: Social determinants of health (SDOH) are nonmedical factors that can influence health outcomes. This paper seeks to extract SDOH from clinical texts in the context of the National NLP Clinical Challenges (n2c2) 2022 Track 2 Task.

Materials and methods: Annotated and unannotated data from the Medical Information Mart for Intensive Care III (MIMIC-III) corpus, the Social History Annotation Corpus, and an in-house corpus were used to develop 2 deep learning models that used classification and sequence-to-sequence (seq2seq) approaches.

Results: The seq2seq approach had the highest overall F1 scores in the challenge's 3 subtasks: 0.901 on the extraction subtask, 0.774 on the generalizability subtask, and 0.889 on the learning transfer subtask.

Discussion: Both approaches rely on SDOH event representations that were designed to be compatible with transformer-based pretrained models, with the seq2seq representation supporting an arbitrary number of overlapping and sentence-spanning events. Models with adequate performance could be produced quickly, and the remaining mismatch between representation and task requirements was then addressed in postprocessing. The classification approach used rules to generate entity relationships from its sequence of token labels, while the seq2seq approach used constrained decoding and a constraint solver to recover entity text spans from its sequence of potentially ambiguous tokens.

Conclusion: We proposed 2 different approaches to extract SDOH from clinical texts with high accuracy. However, accuracy suffers on text from new healthcare institutions not present in the training data, and thus generalization remains an important topic for future study.

Keywords: clinical notes; deep learning; information extraction; natural language processing; social determinants of health.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Example sentence with SDOH event annotations, followed by the human-readable versions of the seq2seq and classification representations. SDOH: social determinants of health.
Figure 2.
Figure 2.
The top example (A) is the original training example. It has 2 SDOH events, where the event triggers are “Tob” and “EtOH”. Example (B) is a derived example that covers the same events as (A) but with source text created according to the tight-text-bound. Example (C) is a derived example covering only the first event in (A). Example (D) is the tight-text-bound version of example (C). The final derived example (E) covers only the second event in (A), and the loose-text-bound version is equivalent to the tight-text-bound version. SDOH: social determinants of health.

References

    1. World Health Organization. Social Determinants of Health. https://www.who.int/health-topics/social-determinants-of-health. Accessed September 2022.
    1. Remington P, Catlin B, Gennuso K.. The county health rankings: rationale and methods. Popul Health Metr 2015; 13: 11. - PMC - PubMed
    1. Hood C, Gennuso K, Swain G, et al.County health rankings: relationships between determinant factors and health outcomes. Am J Prev Med 2016; 250 (2): 129–35. - PubMed
    1. Rabi D, Edwards A, Southern D, et al.Association of socio-economic status with diabetes prevalence and utilization of diabetes care services. BMC Health Serv Res 2006; 6: 124. - PMC - PubMed
    1. Colhoun H, Hemingway H, Poulter N.. Socio-economic status and blood pressure: an overview analysis. J Hum Hypertens 1998; 12 (2): 91–110. - PubMed

Publication types