Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 30;8(1):124.
doi: 10.1186/s41747-024-00526-1.

Probing clarity: AI-generated simplified breast imaging reports for enhanced patient comprehension powered by ChatGPT-4o

Affiliations

Probing clarity: AI-generated simplified breast imaging reports for enhanced patient comprehension powered by ChatGPT-4o

Roberto Maroncelli et al. Eur Radiol Exp. .

Abstract

Background: To assess the reliability and comprehensibility of breast radiology reports simplified by artificial intelligence using the large language model (LLM) ChatGPT-4o.

Methods: A radiologist with 20 years' experience selected 21 anonymized breast radiology reports, 7 mammography, 7 breast ultrasound, and 7 breast magnetic resonance imaging (MRI), categorized according to breast imaging reporting and data system (BI-RADS). These reports underwent simplification by prompting ChatGPT-4o with "Explain this medical report to a patient using simple language". Five breast radiologists assessed the quality of these simplified reports for factual accuracy, completeness, and potential harm with a 5-point Likert scale from 1 (strongly agree) to 5 (strongly disagree). Another breast radiologist evaluated the text comprehension of five non-healthcare personnel readers using a 5-point Likert scale from 1 (excellent) to 5 (poor). Descriptive statistics, Cronbach's α, and the Kruskal-Wallis test were used.

Results: Mammography, ultrasound, and MRI showed high factual accuracy (median 2) and completeness (median 2) across radiologists, with low potential harm scores (median 5); no significant group differences (p ≥ 0.780), and high internal consistency (α > 0.80) were observed. Non-healthcare readers showed high comprehension (median 2 for mammography and MRI and 1 for ultrasound); no significant group differences across modalities (p = 0.368), and high internal consistency (α > 0.85) were observed. BI-RADS 0, 1, and 2 reports were accurately explained, while BI-RADS 3-6 reports were challenging.

Conclusion: The model demonstrated reliability and clarity, offering promise for patients with diverse backgrounds. LLMs like ChatGPT-4o could simplify breast radiology reports, aid in communication, and enhance patient care.

Relevance statement: Simplified breast radiology reports generated by ChatGPT-4o show potential in enhancing communication with patients, improving comprehension across varying educational backgrounds, and contributing to patient-centered care in radiology practice.

Key points: AI simplifies complex breast imaging reports, enhancing patient understanding. Simplified reports from AI maintain accuracy, improving patient comprehension significantly. Implementing AI reports enhances patient engagement and communication in breast imaging.

Keywords: Artificial intelligence; Breast radiology; Large language models; Natural language processing; Patient-centered care.

PubMed Disclaimer

Conflict of interest statement

FP is a member of the Scientific Editorial Board for European Radiology Experimental (section: Breast). They did not participate in the selection or review processes for this article. The remaining authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Figures

Fig. 1
Fig. 1
Flowchart, following the random selection of 21 anonymized breast radiology reports, ChatGPT was prompted to produce multiple simplified reports. Subsequently, five breast radiologists are assigned to assess their quality through a questionnaire, and a breast radiologist evaluates the comprehension of five non-healthcare readers (NHRs)
Fig. 2
Fig. 2
Example of the simplification process of a mammography report with ChatGPT-4o
Fig. 3
Fig. 3
The questionnaires designed for the Radiologist’s evaluation (on the right) and for the understanding of NHRs (on the left)
Fig. 4
Fig. 4
The frequency of the radiologists’ ratings for mammography, ultrasound (US), and MRI reports (on the top), as well as the combined frequency for factual accuracy, completeness, and potential harm (on the bottom), using a 5-point Likert scale (1 = strongly agree; 2 = agree; 3 = neutral; 4 = disagree; and 5 = strongly disagree)
Fig. 5
Fig. 5
The frequency of NHRs’ understanding ratings for mammography, ultrasound, and MRI reports (on the top), as well as the combined frequency (on the bottom), using a 5-point Likert scale (1 = excellent; 2 = good; 3 = fair; 4 = adequate; and 5 = poor)

References

    1. ChatGPT: optimizing language models for dialogue (2022) OpenAI [cited 2022 Dec 28]. https://openai.com/blog/chatgpt/. Accessed 28 Dec 2022
    1. Hello ChatGPT-4o (2024) OpenAI [cited 2024 May 13]. https://openai.com/index/hello-gpt-4o/. Accessed 13 May 2024
    1. Mallio CA, Sertorio AC, Bernetti C, Beomonte Zobel B (2023) Large language models for structured reporting in radiology: performance of GPT-4, ChatGPT-3.5, Perplexity and Bing. Radiol Med 128:808–812. 10.1007/s11547-023-01651-4 - PubMed
    1. Mezrich JL (2022) Immediate radiology report release to patients: point-radiologists should embrace this opportunity to provide patient-centered care while improving the specialty’s profile. AJR Am J Roentgenol 219:555–556. 10.2214/AJR.21.27084 - PubMed
    1. Ali K, Barhom N, Tamimi F, Duggal M (2024) ChatGPT-A double-edged sword for healthcare education? Implications for assessments of dental students. Eur J Dent Educ 28:206–211. 10.1111/eje.12937 - PubMed

LinkOut - more resources