Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 23;3(4):548-560.
doi: 10.34197/ats-scholar.2022-0010OC. eCollection 2022 Dec.

Initial Development of an Automated Platform for Assessing Trainee Performance on Case Presentations

Affiliations

Initial Development of an Automated Platform for Assessing Trainee Performance on Case Presentations

Andrew J King et al. ATS Sch. .

Abstract

Background: Oral case presentation is a crucial skill of physicians and a key component of team-based care. However, consistent and objective assessment and feedback on presentations during training are infrequent.

Objective: To determine the potential value of applying natural language processing, computer software that extracts meaning from text, to transcripts of oral case presentations as a strategy to assess their quality automatically and objectively.

Methods: We transcribed a collection of simulated oral case presentations. The presentations were from eight critical care fellows and one critical care attending. They were instructed to review the medical charts of 11 real intensive care unit patient cases and to audio record themselves, presenting each case as if they were doing so on morning rounds. We then used natural language processing to convert the transcripts from human-readable text into machine-readable numbers. These numbers represent details of the presentation style and content. The distance between the numeric representation of two different transcripts negatively correlates with the similarity of those two transcripts. We ranked fellows on the basis of how similar their presentations were to the attending's presentations.

Results: The 99 presentations included 260 minutes of audio (mean length: 2.6 ± 1.24 min per case). On average, 23.88 ± 2.65 sentences were spoken, and each sentence had 14.10 ± 0.67 words, 3.62 ± 0.15 medical concepts, and 0.75 ± 0.09 medical adjectives. When ranking fellows on the basis of how similar their presentations were to the attending's presentation, we found a gap between the five fellows with the most similar presentations and the three fellows with the least similar presentations (average group similarity scores of 0.62 ± 0.01 and 0.53 ± 0.01, respectively). Rankings were sensitive to whether presentation style or content information were weighted more heavily when calculating transcript similarity.

Conclusion: Natural language processing enabled the ranking of case presentations on the basis of how similar they were to a reference presentation. Although additional work is needed to convert these rankings, and underlying similarity scores, into actionable feedback for trainees, these methods may support new tools for improving medical education.

Keywords: clinical rounds; deep learning; intensive care unit; medical education; natural language processing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A conceptual model for assessment of oral case presentations. This conceptual model includes three modes: 1) traditional assessment (the current state); 2) automated comparative assessment; and 3) automated field assessment. Automatic comparative assessment requires a reference presentation provided by a comparator subject, such as a senior physician. Both the trainee’s and comparator subject’s presentations are transcribed, transformed into a numeric representation using natural language processing, and assessed using a similarity score. The automatic field assessment is performed during actual clinical discussions. It requires rounds to be audio recorded and automatically transcribed using automatic speech recognition technology. It also requires a large set of past patient cases with transcripts of known quality. These data are used to train a pair of machine learning models: one for generating the numeric representation of a reference presentation when provided with a patient’s electronic health record data and a second for generating feedback when provided with the numeric representations of two presentations of the same patient case.
Figure 2.
Figure 2.
Radar plots showing the values of features from the numeric representation of presentation style. Each radar plot shows the calculated values of one feature for all presentations of a patient case. Each row of plots corresponds to one feature. Each column of plots corresponds to one case (01–11). Within a plot, each corner corresponds to one physician (A–I). The center of each plot is a value of 0; values increase as you move outwards toward the circular gridlines. Gridline values for each row of plots are shown on the right side. An utterance is analogous to a spoken sentence.
Figure 3.
Figure 3.
Yarn diagram of automated ranking of eight trainees who presented 11 patient cases. The diagram depicts trainee rank (A–H) for each case (01–11). Trainee rank is on the basis of how similar a trainee’s presentation was to the comparator subject’s presentation. Similarity was calculated using the Style–Content Similarity Score. The strings connecting a physician’s position across columns illustrate changes in rank from one case to the next. Average scores and associated rankings are shown on the right side.
Figure 4.
Figure 4.
Sensitivity of rank to different weightings of presentation style and content importance. The weighted average between style and content importance was modified in 5% increments from 100% to 0% and 0% to 100%, respectively. The y-axis shows the average number of changes in rank per case as the weighting varies. The baseline rank is when importance is split evenly (50% each).

References

    1. Williams DE, Surakanti S. Developing oral case presentation skills: peer and self-evaluations as instructional tools. Ochsner J . 2016;16:65–69. - PMC - PubMed
    1. Holmboe ES. Faculty and the observation of trainees’ clinical skills: problems and opportunities. Acad Med . 2004;79:16–22. - PubMed
    1. Newble D, Dawson B, Dauphinee D, Page G, Macdonald M, Swanson D, et al. Guidelines for assessing clinical competence. Teach Learn Med . 1994;6:213–220.
    1. Kerlin MP, Harhay MO, Vranas KC, Cooney E, Ratcliffe SJ, Halpern SD. Objective factors associated with physicians’ and nurses’ perceptions of intensive care unit capacity strain. Ann Am Thorac Soc . 2014;11:167–172. - PMC - PubMed
    1. Haber RJ, Lingard LA. Learning oral presentation skills: a rhetorical analysis with pedagogical and professional implications. J Gen Intern Med . 2001;16:308–314. - PMC - PubMed

LinkOut - more resources