Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models

Affiliations

¹ Anesthesia Service, San Francisco VA Health Care System, University of California, San Francisco, San Francisco, CA; Department of Anesthesia and Perioperative Care, University of California, San Francisco, San Francisco, CA. Electronic address: Julien.cobert@ucsf.edu.
² Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA.
³ Division of Geriatrics, University of California, San Francisco, San Francisco, CA.
⁴ School of Medicine, University of California, San Diego, San Diego, CA.
⁵ Department of Psychiatry, Harvard Medical School, Boston, MA; Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA.
⁶ Division of Pulmonary, Allergy, and Critical Care Medicine, Duke University, Durham, NC.
⁷ Department of Medicine, the Division of Critical Care and Palliative Medicine, University of California, San Francisco, San Francisco, CA; Department of Surgery, University of California, San Francisco, San Francisco, CA.
⁸ Department of Anesthesia and Perioperative Care, Duke University, Durham, NC.
⁹ Department of Geriatrics, Palliative, and Extended Care, Veterans Affairs Medical Center, University of California, San Francisco, San Francisco, CA; Division of Geriatrics, University of California, San Francisco, San Francisco, CA.

PMID: 38199323
PMCID: PMC11317817
DOI: 10.1016/j.chest.2023.12.031

Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models

Julien Cobert et al. Chest. 2024 Jun.

. 2024 Jun;165(6):1481-1490.

doi: 10.1016/j.chest.2023.12.031. Epub 2024 Jan 8.

Affiliations

¹ Anesthesia Service, San Francisco VA Health Care System, University of California, San Francisco, San Francisco, CA; Department of Anesthesia and Perioperative Care, University of California, San Francisco, San Francisco, CA. Electronic address: Julien.cobert@ucsf.edu.
² Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA.
³ Division of Geriatrics, University of California, San Francisco, San Francisco, CA.
⁴ School of Medicine, University of California, San Diego, San Diego, CA.
⁵ Department of Psychiatry, Harvard Medical School, Boston, MA; Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA.
⁶ Division of Pulmonary, Allergy, and Critical Care Medicine, Duke University, Durham, NC.
⁷ Department of Medicine, the Division of Critical Care and Palliative Medicine, University of California, San Francisco, San Francisco, CA; Department of Surgery, University of California, San Francisco, San Francisco, CA.
⁸ Department of Anesthesia and Perioperative Care, Duke University, Durham, NC.
⁹ Department of Geriatrics, Palliative, and Extended Care, Veterans Affairs Medical Center, University of California, San Francisco, San Francisco, CA; Division of Geriatrics, University of California, San Francisco, San Francisco, CA.

PMID: 38199323
PMCID: PMC11317817
DOI: 10.1016/j.chest.2023.12.031

Abstract

Background: Language in nonmedical data sets is known to transmit human-like biases when used in natural language processing (NLP) algorithms that can reinforce disparities. It is unclear if NLP algorithms of medical notes could lead to similar transmissions of biases.

Research question: Can we identify implicit bias in clinical notes, and are biases stable across time and geography?

Study design and methods: To determine whether different racial and ethnic descriptors are similar contextually to stigmatizing language in ICU notes and whether these relationships are stable across time and geography, we identified notes on critically ill adults admitted to the University of California, San Francisco (UCSF), from 2012 through 2022 and to Beth Israel Deaconess Hospital (BIDMC) from 2001 through 2012. Because word meaning is derived largely from context, we trained unsupervised word-embedding algorithms to measure the similarity (cosine similarity) quantitatively of the context between a racial or ethnic descriptor (eg, African-American) and a stigmatizing target word (eg, nonco-operative) or group of words (violence, passivity, noncompliance, nonadherence).

Results: In UCSF notes, Black descriptors were less likely to be similar contextually to violent words compared with White descriptors. Contrastingly, in BIDMC notes, Black descriptors were more likely to be similar contextually to violent words compared with White descriptors. The UCSF data set also showed that Black descriptors were more similar contextually to passivity and noncompliance words compared with Latinx descriptors.

Interpretation: Implicit bias is identifiable in ICU notes. Racial and ethnic group descriptors carry different contextual relationships to stigmatizing words, depending on when and where notes were written. Because NLP models seem able to transmit implicit bias from training data, use of NLP algorithms in clinical prediction could reinforce disparities. Active debiasing strategies may be necessary to achieve algorithmic fairness when using language models in clinical research.

Keywords: critical care; inequity; linguistics; machine learning; natural language processing.

Published by Elsevier Inc.

PubMed Disclaimer

Conflict of interest statement

Financial/Nonfinancial Disclosures None declared.

Figures

**Figure 1**
Diagrams showing the overall framework of the w2v model and outcomes being assessed. A, Diagram showing what is inputted into w2v. All notes are preprocessed and eventually broken down to individual words or phrases (tokens). All tokens then are turned into a string of numbers that allows for imputation into the neural network (w2v). w2v consists of a single layer of neurons or nodes (n_k, within the neural network’s so-called hidden layer) that learns the similarity of words from the underlying training note data set. Each node learns and iterates different weights for each word or token based on its context. Similarities of different words represent the probability such that a word is replaceable with another relative to the original word’s context. Arrows in the w2v icon represents the positive (black) or negative (red) weight learned by each node in the hidden layer. B, Diagram showing how neural networks work in w2v. Neurons in the hidden layer calculate unique weights (black and red arrows) for each word or text string inputted into the model. Weights can be large and small and positive and negative. The group of neurons carries a unique pattern (represented by black and red arrows) unique to each inputted word. The unique patterns or weights are used to relate the input words to one another in geometric space to determine the relationship of each word to one another. The outputted word vectors are called word embeddings and are unique to the data set used as input and represent how different words that appear in similar contexts are represented as being close together spatially. Word embeddings (outputs of the w2v neural network) provide geometric relationships between two words within a document. C, Diagram showing how w2v outputs contextual similarity. The neural network calculates probabilities such that each word can be interchanged with every other word in the training data, represented by the degree of similarity between the k-dimensional vector outputted from the w2v neural network. The contextual similarity or dissimilarity then can be calculated by projecting the k-dimensional vectors onto a 2-D plane. Cosine similarity can be used as a continuous measure of how similar or dissimilar a base word (eg, Caucasian) is from a target word (eg, noncompliant) based on their contexts. A cosine similarity of +1 represents two words that are perfectly similar (Caucasian will perfectly match to Caucasian or a misspelling and will match closely to White person), and a cosine similarity of –1 represents two words that are perfectly dissimilar (Caucasian and Asian likely are nearly perfectly dissimilar because they are mutually exclusive). cos = cosine; w2v = word2vec.

**Figure 2**
A, B, Precision-weighted averages were used to combine words containing to violence, passivity, nonadherence, and noncompliance across the UCSF data set (A) and MIMIC-III data set (B) to show individual and comparative similarity in race and ethnicity base words and target thematic word groups. Each part of the figure demonstrates the association of individual race or ethnicity descriptor groups and the thematic group of words and the differences across the racial or ethnic groups themselves. Individual horizontal bars represent the bootstrap 95% CIs for cosine similarities within each individual race or ethnicity and a thematic group of words. For horizontal CIs, statistical significance (defined as a P < .05) was met if they did not cross the cosine similarity value of 0. Vertical brackets with asterisks represent whether a statistically different difference exists in the similarity or difference across different races or ethnicities relative to a specific thematic group of words. Asterisks represent statistical significance with P < .05 for vertical brackets. MIMIC-III = Medical Information Mart for Intensive Care III; UCSF = University of California, San Francisco.

See this image and copyright information in PMC

References

1. Caliskan A., Bryson J.J., Narayanan A. Semantics derived automatically from language corpora contain human-like biases. Science. 2017;356(6334):183–186. - PubMed
1. Bolukbasi T., Chang K.W., Zou J., Saligrama V., Kalai A. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings [published online July 21, 2016] http://arxiv.org/abs/1607.06520
1. Durrheim K., Schuld M., Mafunda M., Mazibuko S. Using word embeddings to investigate cultural biases. Br J Social Psychol. 2023;62(1):617–629. - PMC - PubMed
1. Garg N., Schiebinger L., Jurafsky D., Zou J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc Natl Acad Sci U S A. 2018;115(16):E3635–E3644. - PMC - PubMed
1. Charlesworth T.E.S., Caliskan A., Banaji M.R. Historical representations of social groups across 200 years of word embeddings from Google Books. Proc Natl Acad Sci U S A. 2022;119(28) - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models

Affiliations

Measuring Implicit Bias in ICU Notes Using Word-Embedding Neural Network Models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources