Automatic extraction of social determinants of health from medical notes of chronic lower back pain patients

doi:10.1093/jamia/ocad054

. 2023 Jul 19;30(8):1438-1447.

doi: 10.1093/jamia/ocad054.

Automatic extraction of social determinants of health from medical notes of chronic lower back pain patients

Dmytro S Lituiev¹, Benjamin Lacar^{1

2}, Sang Pak³, Peter L Abramowitsch¹, Emilia H De Marchis⁴, Thomas A Peterson^{1

5}

Affiliations

¹ Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA.
² Berkeley Institute for Data Science, University of California, Berkeley, California, USA.
³ Department of Physical Therapy and Rehabilitation Science, University of California San Francisco, San Francisco, California, USA.
⁴ Department of Family & Community Medicine, University of California San Francisco, San Francisco, California, USA.
⁵ Department of Orthopaedic Surgery, University of California San Francisco, San Francisco, California, USA.

PMID: 37080559
PMCID: PMC10354762
DOI: 10.1093/jamia/ocad054

Automatic extraction of social determinants of health from medical notes of chronic lower back pain patients

Dmytro S Lituiev et al. J Am Med Inform Assoc. 2023.

. 2023 Jul 19;30(8):1438-1447.

doi: 10.1093/jamia/ocad054.

Authors

Dmytro S Lituiev¹, Benjamin Lacar^{1

2}, Sang Pak³, Peter L Abramowitsch¹, Emilia H De Marchis⁴, Thomas A Peterson^{1

5}

Affiliations

¹ Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA.
² Berkeley Institute for Data Science, University of California, Berkeley, California, USA.
³ Department of Physical Therapy and Rehabilitation Science, University of California San Francisco, San Francisco, California, USA.
⁴ Department of Family & Community Medicine, University of California San Francisco, San Francisco, California, USA.
⁵ Department of Orthopaedic Surgery, University of California San Francisco, San Francisco, California, USA.

PMID: 37080559
PMCID: PMC10354762
DOI: 10.1093/jamia/ocad054

Abstract

Objective: We applied natural language processing and inference methods to extract social determinants of health (SDoH) information from clinical notes of patients with chronic low back pain (cLBP) to enhance future analyses of the associations between SDoH disparities and cLBP outcomes.

Materials and methods: Clinical notes for patients with cLBP were annotated for 7 SDoH domains, as well as depression, anxiety, and pain scores, resulting in 626 notes with at least one annotated entity for 364 patients. We used a 2-tier taxonomy with these 10 first-level classes (domains) and 52 second-level classes. We developed and validated named entity recognition (NER) systems based on both rule-based and machine learning approaches and validated an entailment model.

Results: Annotators achieved a high interrater agreement (Cohen's kappa of 95.3% at document level). A rule-based system (cTAKES), RoBERTa NER, and a hybrid model (combining rules and logistic regression) achieved performance of F1 = 47.1%, 84.4%, and 80.3%, respectively, for first-level classes.

Discussion: While the hybrid model had a lower F1 performance, it matched or outperformed RoBERTa NER model in terms of recall and had lower computational requirements. Applying an untuned RoBERTa entailment model, we detected many challenging wordings missed by NER systems. Still, the entailment model may be sensitive to hypothesis wording.

Conclusion: This study developed a corpus of annotated clinical notes covering a broad spectrum of SDoH classes. This corpus provides a basis for training machine learning models and serves as a benchmark for predictive models for NER for SDoH and knowledge extraction from clinical texts.

Keywords: depression; lower back pain; machine learning; natural language inference; natural language processing; social determinants of health.

Published by Oxford University Press on behalf of the American Medical Informatics Association 2023.

PubMed Disclaimer

Conflict of interest statement

DSL is a shareholder of Crosscope Inc and SynthezAI Corp and is currently employed by Johnson & Johnson. BL is supported by Innovate for Health Data Science Fellowship from Johnson & Johnson. PLA received funding from REAC RAP UCSF through UCSF. EDM received support from Hellman Fellows Fund Payment, and REAC RAP UCSF through UCSF. SP received support from Back Pain Consortium (BACPAC) grant through UCSF.

Figures

**Figure 1.**
Study design. (A) Workflow of the study. (B). Annotation ontology. Clinical notes were annotated such that text relevant to the 7 studied social risk factors (solid border) or 3 clinical factors (dashed border) were marked. Two levels of labels were used, such that the second level was a subcategory of the first. Level 2 labels for each Level 1 annotation are shown in descending order of frequency. Level 2 annotations that comprised <1% of the group’s annotations are not shown. Text that can be classified to the first level but not the second due to ambiguity or low frequency is designated as “NA”. Examples of selected text are shown within the hypothetical clinical note.

**Figure 2.**
Exploratory data analysis. (A) Histogram of number of entities in different note types. (B) Number of entities per note type and first-level annotated domain. The pictorial legend contains the total number of notes and annotations per note type.

**Figure 3.**
Comparison of model performance. (A) Comparison of F₁ performance in 4 best performing models per model class. Second-level metrics are aggregated using weighted average over first-level domains. (B) Comparison of F₁, precision, and recall in all studied models. Metrics are aggregated using weighted average.

**Figure 4.**
Examples of predictions from 4 best models per model class. Left: NER models. Right: RoBERTA entailment model. Probabilities of 3 possible relations are shown as shaded horizontal bars and numerically together with a final relation prediction.

See this image and copyright information in PMC

Cited by

Social determinants of health extraction from clinical notes across institutions using large language models.
Keloth VK, Selek S, Chen Q, Gilman C, Fu S, Dang Y, Chen X, Hu X, Zhou Y, He H, Fan JW, Wang K, Brandt C, Tao C, Liu H, Xu H. Keloth VK, et al. NPJ Digit Med. 2025 May 17;8(1):287. doi: 10.1038/s41746-025-01645-8. NPJ Digit Med. 2025. PMID: 40379919 Free PMC article.
Clinical Significance of Marital Status and Changes in Status Extracted from Unstructured Clinical Notes Using Ensembles of Off-the-Shelf Extraction Models.
Scherbakov DA, Heider PM, Obeid JS, Alekseyenko AV, Lenert LA. Scherbakov DA, et al. Res Sq [Preprint]. 2025 May 5:rs.3.rs-6578415. doi: 10.21203/rs.3.rs-6578415/v1. Res Sq. 2025. PMID: 40386391 Free PMC article. Preprint.
Extracting social determinants of health events with transformer-based multitask, multilabel named entity recognition.
Richie R, Ruiz VM, Han S, Shi L, Tsui FR. Richie R, et al. J Am Med Inform Assoc. 2023 Jul 19;30(8):1379-1388. doi: 10.1093/jamia/ocad046. J Am Med Inform Assoc. 2023. PMID: 37002953 Free PMC article.
Evaluating associations between social risks and health care utilization in patients with chronic low back pain.
Pak SS, Jiang Y, Lituiev DS, De Marchis EH, Peterson TA. Pak SS, et al. Pain Rep. 2024 Oct 8;9(6):e1191. doi: 10.1097/PR9.0000000000001191. eCollection 2024 Dec. Pain Rep. 2024. PMID: 39391767 Free PMC article.
Topic modeling on clinical social work notes for exploring social determinants of health factors.
Sun S, Zack T, Williams CYK, Sushil M, Butte AJ. Sun S, et al. JAMIA Open. 2024 Jan 14;7(1):ooad112. doi: 10.1093/jamiaopen/ooad112. eCollection 2024 Apr. JAMIA Open. 2024. PMID: 38223407 Free PMC article.

See all "Cited by" articles

References

1. Hatef E, Predmore Z, Lasser EC, et al.Integrating social and behavioral determinants of health into patient care and population health at Veterans Health Administration: a conceptual framework and an assessment of available individual and population level data sources and evidence-based measurements. AIMS Public Health 2019; 6: 209–24. - PMC - PubMed
1. Anderson KO, Green CR, Payne R.. Racial and ethnic disparities in pain: causes and consequences of unequal care. J Pain 2009; 10: 1187–204. - PubMed
1. James SL, Abate D, Abate KH, et al.Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 2018; 392: 1789–858. - PMC - PubMed
1. U.S. Burden of Disease Collaborators; Mokdad AH, Ballestros K, Echko M, et al.The State of US Health, 1990–2016: burden of diseases, injuries, and risk factors among US states. JAMA 2018; 319: 1444–72. - PMC - PubMed
1. Dutmer AL, Schiphorst Preuper HR, Soer R, et al.Personal and societal impact of low back pain: the Groningen Spine cohort. Spine (Phila Pa 1976) 2019; 44 (24): E1443–51. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

[1] Hatef E, Predmore Z, Lasser EC, et al.Integrating social and behavioral determinants of health into patient care and population health at Veterans Health Administration: a conceptual framework and an assessment of available individual and population level data sources and evidence-based measurements. AIMS Public Health 2019; 6: 209–24. - PMC - PubMed

[2] Hatef E, Predmore Z, Lasser EC, et al.Integrating social and behavioral determinants of health into patient care and population health at Veterans Health Administration: a conceptual framework and an assessment of available individual and population level data sources and evidence-based measurements. AIMS Public Health 2019; 6: 209–24. - PMC - PubMed

[3] Anderson KO, Green CR, Payne R.. Racial and ethnic disparities in pain: causes and consequences of unequal care. J Pain 2009; 10: 1187–204. - PubMed

[4] Anderson KO, Green CR, Payne R.. Racial and ethnic disparities in pain: causes and consequences of unequal care. J Pain 2009; 10: 1187–204. - PubMed

[5] James SL, Abate D, Abate KH, et al.Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 2018; 392: 1789–858. - PMC - PubMed

[6] James SL, Abate D, Abate KH, et al.Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet 2018; 392: 1789–858. - PMC - PubMed

[7] U.S. Burden of Disease Collaborators; Mokdad AH, Ballestros K, Echko M, et al.The State of US Health, 1990–2016: burden of diseases, injuries, and risk factors among US states. JAMA 2018; 319: 1444–72. - PMC - PubMed

[8] U.S. Burden of Disease Collaborators; Mokdad AH, Ballestros K, Echko M, et al.The State of US Health, 1990–2016: burden of diseases, injuries, and risk factors among US states. JAMA 2018; 319: 1444–72. - PMC - PubMed

[9] Dutmer AL, Schiphorst Preuper HR, Soer R, et al.Personal and societal impact of low back pain: the Groningen Spine cohort. Spine (Phila Pa 1976) 2019; 44 (24): E1443–51. - PubMed

[10] Dutmer AL, Schiphorst Preuper HR, Soer R, et al.Personal and societal impact of low back pain: the Groningen Spine cohort. Spine (Phila Pa 1976) 2019; 44 (24): E1443–51. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automatic extraction of social determinants of health from medical notes of chronic lower back pain patients

Affiliations

Automatic extraction of social determinants of health from medical notes of chronic lower back pain patients

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources