Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May:141:104360.
doi: 10.1016/j.jbi.2023.104360. Epub 2023 Apr 14.

Predicting relations between SOAP note sections: The value of incorporating a clinical information model

Affiliations

Predicting relations between SOAP note sections: The value of incorporating a clinical information model

Vimig Socrates et al. J Biomed Inform. 2023 May.

Abstract

Physician progress notes are frequently organized into Subjective, Objective, Assessment, and Plan (SOAP) sections. The Assessment section synthesizes information recorded in the Subjective and Objective sections, and the Plan section documents tests and treatments to narrow the differential diagnosis and manage symptoms. Classifying the relationship between the Assessment and Plan sections has been suggested to provide valuable insight into clinical reasoning. In this work, we use a novel human-in-the-loop pipeline to classify the relationships between the Assessment and Plan sections of SOAP notes as a part of the n2c2 2022 Track 3 Challenge. In particular, we use a clinical information model constructed from both the entailment logic expected from the aforementioned Challenge and the problem-oriented medical record. This information model is used to label named entities as primary and secondary problems/symptoms, events and complications in all four SOAP sections. We iteratively train separate Named Entity Recognition models and use them to annotate entities in all notes/sections. We fine-tune a downstream RoBERTa-large model to classify the Assessment-Plan relationship. We evaluate multiple language model architectures, preprocessing parameters, and methods of knowledge integration, achieving a maximum macro-F1 score of 82.31%. Our initial model achieves top-2 performance during the challenge (macro-F1: 81.52%, competitors' macro-F1 range: 74.54%-82.12%). We improved our model by incorporating post-challenge annotations (S&O sections), outperforming the top model from the Challenge. We also used Shapley additive explanations to investigate the extent of language model clinical logic, under the lens of our clinical information model. We find that the model often uses shallow heuristics and nonspecific attention when making predictions, suggesting language model knowledge integration requires further research.

Keywords: Electronic health record; Entailment; Intensive care unit; Language modeling; Natural language processing; SOAP notes.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Vimig Socrates reports financial support was provided by National Institutes of Health. Aidan Gilson reports financial support was provided by National Institute of Diabetes and Digestive and Kidney Diseases. Aidan Gilson reports financial support was provided by Yale School of Medicine.

Figures

Figure 1:
Figure 1:
Clinical information model derived from n2c2 annotation guidelines. In blue, we highlight the four sections of the SOAP note. All other nodes in this graph act as named entities to be annotated.
Figure 2:
Figure 2:
Screenshots of Prodigy Assessment and Plan subsection annotation interfaces with a DIRECT entailment relation
Figure 3:
Figure 3:
Task pipeline including NER pipeline and classification model
Figure 4:
Figure 4:
An image of the Subjective & Objective in Prodigy with limited context that makes problem, sign, and symptom identification difficult.
Figure 5:
Figure 5:
In this example, the model correct identified a DIRECT relation based on a directly referred sign/symptom. However, when we look at the attention of the tokens over the Assessment text, we see a very nonspecific relationship between all problems, signs, and symptoms mentioned and Hypotension in the Plan subsection. While the model identifies certain relevant clinical features (including low BP in the S&O sections) much of its attention is fairly nonspecific, implying a lack of clinical reasoning in decision making.
Figure 6:
Figure 6:
Model incorrectly labels a note as a NEITHER relation without the Subjective & Objective context, but then adjusts its predict to the correct one with additional context.

References

    1. Agarwal Oshin, Ge Heming, Shakeri Siamak, and Rami Al-Rfou. Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. arXiv preprint arXiv:2010.12688, 2020.
    1. Black Sid, Gao Leo, Wang Phil, Leahy Connor, and Biderman Stella. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, March 2021. URL 10.5281/zenodo.5297715. - DOI
    1. Adriane Boyd. explosion/spaCy: v2.3.9: Compatibility with NumPy v1.24+, December 2022. URL 10.5281/zenodo.7445599. - DOI
    1. Bravo Àlex, Piñero Janet, Queralt-Rosinach Núria, Rautschka Michael, and Furlong Laura I. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC bioinformatics, 16:1–17, 2015. - PMC - PubMed
    1. Downing N Lance, Bates David W, and Longhurst Christopher A. Physician burnout in the electronic health record era: are we ignoring the real cause? Annals of Internal Medicine, 169(1):50–51, 2018. - PubMed

Publication types