Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 14;14(3):e076484.
doi: 10.1136/bmjopen-2023-076484.

Application of generative language models to orthopaedic practice

Affiliations

Application of generative language models to orthopaedic practice

Jessica Caterson et al. BMJ Open. .

Abstract

Objective: To explore whether large language models (LLMs) Generated Pre-trained Transformer (GPT)-3 and ChatGPT can write clinical letters and predict management plans for common orthopaedic scenarios.

Design: Fifteen scenarios were generated and ChatGPT and GPT-3 prompted to write clinical letters and separately generate management plans for identical scenarios with plans removed.

Main outcome measures: Letters were assessed for readability using the Readable Tool. Accuracy of letters and management plans were assessed by three independent orthopaedic surgery clinicians.

Results: Both models generated complete letters for all scenarios after single prompting. Readability was compared using Flesch-Kincade Grade Level (ChatGPT: 8.77 (SD 0.918); GPT-3: 8.47 (SD 0.982)), Flesch Readability Ease (ChatGPT: 58.2 (SD 4.00); GPT-3: 59.3 (SD 6.98)), Simple Measure of Gobbledygook (SMOG) Index (ChatGPT: 11.6 (SD 0.755); GPT-3: 11.4 (SD 1.01)), and reach (ChatGPT: 81.2%; GPT-3: 80.3%). ChatGPT produced more accurate letters (8.7/10 (SD 0.60) vs 7.3/10 (SD 1.41), p=0.024) and management plans (7.9/10 (SD 0.63) vs 6.8/10 (SD 1.06), p<0.001) than GPT-3. However, both LLMs sometimes omitted key information or added additional guidance which was at worst inaccurate.

Conclusions: This study shows that LLMs are effective for generation of clinical letters. With little prompting, they are readable and mostly accurate. However, they are not consistent, and include inappropriate omissions or insertions. Furthermore, management plans produced by LLMs are generic but often accurate. In the future, a healthcare specific language model trained on accurate and secure data could provide an excellent tool for increasing the efficiency of clinicians through summarisation of large volumes of data into a single clinical letter.

Keywords: HEALTH SERVICES ADMINISTRATION & MANAGEMENT; Health informatics; ORTHOPAEDIC & TRAUMA SURGERY; Organisational development.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

Figure 1
Figure 1
Accuracy scores for ChatGPT and GPT-3 generate (A) letters and (B) management plans, independently scored by three senior orthopaedic clinicians. Grey lines show paired prompts. Compared using paired t-test; *, p<0.05, ****, p<0.001. GPT, Generated Pre-trained Transformer.

References

    1. British Orthopaedic Association . Consultant advisory book. 2023. Available: https://www.boa.ac.uk/standards-guidance/consultant-advisory-book.html [Accessed 13 Apr 2023].
    1. Hook SE, Banister GC, Topliss C, et al. . Letters and notes in orthopaedic surgery. Ann R Coll Surg Engl 2006;88:292–6. 10.1308/003588406X98612 - DOI - PMC - PubMed
    1. Longworth A, Davies D, Amirfeyz R, et al. . Notes and Letters in Orthopaedic Surgery Revisited: Can Surgeons Change? Bulletin 2010;92:86–8. 10.1308/147363510X486697 - DOI
    1. British Orthopaedic Association . England and Wales T&O Waiting Times data for, March . 2022Available: https://www.boa.ac.uk/resources/england-and-wales-t-o-waiting-times-data... [Accessed 13 Apr 2023].
    1. IBM . What is Natural Language Processing? | IBM, Available: https://www.ibm.com/uk-en/topics/natural-language-processing [Accessed 13 Apr 2023].

LinkOut - more resources