Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 1;32(2):308-317.
doi: 10.1093/jamia/ocae290.

Identifying stigmatizing and positive/preferred language in obstetric clinical notes using natural language processing

Affiliations

Identifying stigmatizing and positive/preferred language in obstetric clinical notes using natural language processing

Jihye Kim Scroggins et al. J Am Med Inform Assoc. .

Abstract

Objective: To identify stigmatizing language in obstetric clinical notes using natural language processing (NLP).

Materials and methods: We analyzed electronic health records from birth admissions in the Northeast United States in 2017. We annotated 1771 clinical notes to generate the initial gold standard dataset. Annotators labeled for exemplars of 5 stigmatizing and 1 positive/preferred language categories. We used a semantic similarity-based search approach to expand the initial dataset by adding additional exemplars, composing an enhanced dataset. We employed traditional classifiers (Support Vector Machine, Decision Trees, and Random Forest) and a transformer-based model, ClinicalBERT (Bidirectional Encoder Representations from Transformers) and BERT base. Models were trained and validated on initial and enhanced datasets and were tested on enhanced testing dataset.

Results: In the initial dataset, we annotated 963 exemplars as stigmatizing or positive/preferred. The most frequently identified category was marginalized language/identities (n = 397, 41%), and the least frequent was questioning patient credibility (n = 51, 5%). After employing a semantic similarity-based search approach, 502 additional exemplars were added, increasing the number of low-frequency categories. All NLP models also showed improved performance, with Decision Trees demonstrating the greatest improvement (21%). ClinicalBERT outperformed other models, with the highest average F1-score of 0.78.

Discussion: Clinical BERT seems to most effectively capture the nuanced and context-dependent stigmatizing language found in obstetric clinical notes, demonstrating its potential clinical applications for real-time monitoring and alerts to prevent usages of stigmatizing language use and reduce healthcare bias. Future research should explore stigmatizing language in diverse geographic locations and clinical settings to further contribute to high-quality and equitable perinatal care.

Conclusion: ClinicalBERT effectively captures the nuanced stigmatizing language in obstetric clinical notes. Our semantic similarity-based search approach to rapidly extract additional exemplars enhanced the performances while reducing the need for labor-intensive annotation.

Keywords: bias; electronic health records; health communication; natural language processing; nursing informatics.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Overview of approach. Abbreviations: BERT, Bidirectional Encoder Representations from Transformers; ML, Machine Learning.

References

    1. Shattell M. Stigmatizing language with unintended meanings: “persons with mental illness” or “mentally ill persons”? Issues Ment Health Nurs. 2009;30:199. - PubMed
    1. Sun M, Oliwa T, Peek ME, et al.Negative patient descriptors: documenting racial bias in the electronic health record. Health Aff (Millwood). 2022;41:203-211. - PMC - PubMed
    1. FitzGerald C, Hurst S.. Implicit bias in healthcare professionals: a systematic review. BMC Med Ethics. 2017;18:19. - PMC - PubMed
    1. Hall WJ, Chapman MV, Lee KM, et al.Implicit racial/ethnic bias among health care professionals and its influence on health care outcomes: a systematic review. Am J Public Health. 2015;105:e60-e76. - PMC - PubMed
    1. Benkert R, Cuevas A, Thompson HS, et al.Ubiquitous yet unclear: a systematic review of medical mistrust. Behav Med. 2019;45:86-101. - PMC - PubMed

Publication types

LinkOut - more resources