Generating colloquial radiology reports with large language models

Affiliations

¹ Department of Radiological Sciences, University of California, Irvine, Irvine, CA 92868, United States.
² Amazon Web Services, East Palo Alto, CA 94303, United States.
³ Amazon Web Services, Seattle, WA 98121, United States.

PMID: 39178375
PMCID: PMC11491646
DOI: 10.1093/jamia/ocae223

Generating colloquial radiology reports with large language models

Cynthia Crystal Tang et al. J Am Med Inform Assoc. 2024.

. 2024 Nov 1;31(11):2660-2667.

doi: 10.1093/jamia/ocae223.

Affiliations

¹ Department of Radiological Sciences, University of California, Irvine, Irvine, CA 92868, United States.
² Amazon Web Services, East Palo Alto, CA 94303, United States.
³ Amazon Web Services, Seattle, WA 98121, United States.

PMID: 39178375
PMCID: PMC11491646
DOI: 10.1093/jamia/ocae223

Abstract

Objectives: Patients are increasingly being given direct access to their medical records. However, radiology reports are written for clinicians and typically contain medical jargon, which can be confusing. One solution is for radiologists to provide a "colloquial" version that is accessible to the layperson. Because manually generating these colloquial translations would represent a significant burden for radiologists, a way to automatically produce accurate, accessible patient-facing reports is desired. We propose a novel method to produce colloquial translations of radiology reports by providing specialized prompts to a large language model (LLM).

Materials and methods: Our method automatically extracts and defines medical terms and includes their definitions in the LLM prompt. Using our method and a naive strategy, translations were generated at 4 different reading levels for 100 de-identified neuroradiology reports from an academic medical center. Translations were evaluated by a panel of radiologists for accuracy, likability, harm potential, and readability.

Results: Our approach translated the Findings and Impression sections at the 8th-grade level with accuracies of 88% and 93%, respectively. Across all grade levels, our approach was 20% more accurate than the baseline method. Overall, translations were more readable than the original reports, as evaluated using standard readability indices.

Conclusion: We find that our translations at the eighth-grade level strike an optimal balance between accuracy and readability. Notably, this corresponds to nationally recognized recommendations for patient-facing health communication. We believe that using this approach to draft patient-accessible reports will benefit patients without significantly increasing the burden on radiologists.

Keywords: large language model; machine learning; natural language processing; prompt engineering; radiology.

PubMed Disclaimer

Conflict of interest statement

Authors S.N., N.M., and R.T. are employed by Amazon Web Services.

Figures

**Figure 1.**
Our translation pipeline: medical knowledge (MK)-based prompting.

**Figure 2.**
Illustration of the steps involved for 1 example report.

**Figure 3.**
(A and B): Accuracy of translations generated by each prompting method at 4 different education levels. (C and D): Physician likability of translations generated by each prompting method at 4 different education levels. Higher is better. (E and F): Readability index of translations generated by each prompting method at 4 different education levels. Lower readability index corresponds to simpler language. (G and H): Harm potential of translations generated by each prompting method at 4 different education levels. Lower is better. In all panels, asterisk indicates statistical significance for MK + Indications model vs baseline (P < .05).

**Figure 4.**
Example of attribution when the translation is accurate. The attribution is shown by color coding the sentences, for example, the text in the translation is attributed to the text in the original report. Note that there is text in the original report that is not used for attribution, meaning that it is not present in the translation.

**Figure 5.**
Example of attribution when there is hallucination in the translation. The attribution output for each of the sentences in the translation says “NOT FOUND.”

See this image and copyright information in PMC

References

1. Trofimova A, Vey BL, Safdar NM, et al. Radiology report readability: an opportunity to improve patient communication. J Am Coll Radiol. 2018;15(8):1182-1184. - PubMed
1. Patil S, Yacoub JH, Geng X, et al. Radiology reporting in the era of patient-centered care: how can we improve readability? J Digit Imaging. 2021;34(2):367-373. - PMC - PubMed
1. Mehan WA, Brink JA, Hirsch JA.. 21st century Cures Act: patient-facing implications of information blocking. J Am Coll Radiol. 2021;18(7):1012-1016. - PubMed
1. Johnson AJ, Easterling D, Nelson R, et al. Access to radiologic reports via a patient portal: clinical simulations to investigate patient preferences. J Am Coll Radiol. 2012;9(4):256-263. - PubMed
1. Alarifi M, Patrick T, Jabour A, et al. Understanding patient needs and gaps in radiology reports through online discussion forum analysis. Insights Imaging. 2021;12(1):50-59. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Generating colloquial radiology reports with large language models

Affiliations

Generating colloquial radiology reports with large language models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources