Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov;281(11):6069-6081.
doi: 10.1007/s00405-024-08643-8. Epub 2024 Apr 23.

To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries

Affiliations

To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries

Magdalena Ostrowska et al. Eur Arch Otorhinolaryngol. 2024 Nov.

Abstract

Purpose: As online health information-seeking surges, concerns mount over the quality and safety of accessible content, potentially leading to patient harm through misinformation. On one hand, the emergence of Artificial Intelligence (AI) in healthcare could prevent it; on the other hand, questions raise regarding the quality and safety of the medical information provided. As laryngeal cancer is a prevalent head and neck malignancy, this study aims to evaluate the utility and safety of three large language models (LLMs) as sources of patient information about laryngeal cancer.

Methods: A cross-sectional study was conducted using three LLMs (ChatGPT 3.5, ChatGPT 4.0, and Bard). A questionnaire comprising 36 inquiries about laryngeal cancer was categorised into diagnosis (11 questions), treatment (9 questions), novelties and upcoming treatments (4 questions), controversies (8 questions), and sources of information (4 questions). The population of reviewers consisted of 3 groups, including ENT specialists, junior physicians, and non-medicals, who graded the responses. Each physician evaluated each question twice for each model, while non-medicals only once. Everyone was blinded to the model type, and the question order was shuffled. Outcome evaluations were based on a safety score (1-3) and a Global Quality Score (GQS, 1-5). Results were compared between LLMs. The study included iterative assessments and statistical validations.

Results: Analysis revealed that ChatGPT 3.5 scored highest in both safety (mean: 2.70) and GQS (mean: 3.95). ChatGPT 4.0 and Bard had lower safety scores of 2.56 and 2.42, respectively, with corresponding quality scores of 3.65 and 3.38. Inter-rater reliability was consistent, with less than 3% discrepancy. About 4.2% of responses fell into the lowest safety category (1), particularly in the novelty category. Non-medical reviewers' quality assessments correlated moderately (r = 0.67) with response length.

Conclusions: LLMs can be valuable resources for patients seeking information on laryngeal cancer. ChatGPT 3.5 provided the most reliable and safe responses among the models evaluated.

Keywords: Artificial intelligence; Bard; ChatGPT; Laryngeal cancer; Oncology; Patient education.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interest.

Figures

Fig. 1
Fig. 1
Quality score assessment—legend
Fig. 2
Fig. 2
A flowchart with methodological steps
Fig. 3
Fig. 3
Answer word count length—the distribution among the models
Fig. 4
Fig. 4
Likert plot visualizes responses to quality and safety according to category. Dark blue maximum score, light blue 4 out of 5 points in quality score, green medium value, light red 2 out of 5 points in quality score, and dark red minimum score. Q stands for quality, and S for safety

Similar articles

Cited by

References

    1. Bujnowska-Fedak MM, Waligóra J, Mastalerz-Migas A (2019) The internet as a source of health information and services. Adv Exp Med Biol 1211:1–16. 10.1007/5584_2019_396 - PubMed
    1. (2024) Eurostat. 10.2908/ISOC_CI_AC_I. Accessed 9 Mar 2024
    1. Bergmo TS, Sandsdalen V, Manskow US et al (2023) Internet use for obtaining medicine information: cross-sectional survey. JMIR Form Res 7:e40466. 10.2196/40466 - PMC - PubMed
    1. Li HO-Y, Bailey A, Huynh D, Chan J (2020) YouTube as a source of information on COVID-19: a pandemic of misinformation? BMJ Glob Health. 10.1136/bmjgh-2020-002604 - PMC - PubMed
    1. (2022) OpenAI. https://openai.com/chatgpt. Accessed 9 Mar 2024

LinkOut - more resources