Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 14;18(7):101071.
doi: 10.1016/j.waojou.2025.101071. eCollection 2025 Jul.

How accurate are ChatGPT-4 responses in chronic urticaria? A critical analysis with information quality metrics

Affiliations

How accurate are ChatGPT-4 responses in chronic urticaria? A critical analysis with information quality metrics

Ivan Cherrez-Ojeda et al. World Allergy Organ J. .

Abstract

Background: The increasing use of artificial intelligence (AI) in healthcare, especially in delivering medical information, prompts concerns over the reliability and accuracy of AI-generated responses. This study evaluates the quality, reliability, and readability of ChatGPT-4 responses for chronic urticaria (CU) care, considering the potential implications of inaccurate medical information.

Objective: The goal of the study was to assess the quality, reliability, and readability of ChatGPT-4 responses to inquiries on CU management in accordance with international guidelines, utilizing validated metrics to evaluate the effectiveness of ChatGPT-4 as a resource for medical information acquisition.

Methods: Twenty-four questions were derived from the EAACI/GA2LEN/EuroGuiDerm/APAAACI recommendations and utilized as prompts for ChatGPT-4 to obtain responses in individual chats for each question. The inquiries were categorized into 3 groups: A.) Classification and Diagnosis, B.) Assessment and Monitoring, and C.) Treatment and Management Recommendations. The responses were separately evaluated by allergy specialists utilizing the DISCERN instrument for quality assessment, Journal of the American Medical Association (JAMA) benchmark criteria for reliability evaluation, and Flesch scores for readability analysis. The scores were further examined by median calculations and Intraclass Correlation Coefficient assessments.

Results: Categories A and C exhibited insufficient reliability according to JAMA, with median scores of 1 and 0, respectively. Category B exhibited a low reliability score (median 2, interquartile range 2). The information quality from category C questions was satisfactory (median 51.5, IQR 12.5). All 3 groups exhibited confusing readability levels according to the Flesch assessment.

Limitations: The study's limitations encompass the emphasis on CU, possible bias in question selection, the use of particular instruments such as DISCERN, JAMA, and Flesch, as well as reliance on expert opinion for assessment.

Conclusion: ChatGPT-4 demonstrates potential for producing medical content; nonetheless, its reliability is shaky underscoring the necessity for caution and confirmation when employing AI-generated medical information, especially in the management of CU.

Keywords: Artificial intelligence; Chronic urticaria; Generative artificial intelligence.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Fig. 1
Fig. 1
“Quality, Reliability, and Readability of ChatGPT-4 responses about Chronic Urticaria.”

References

    1. Choudhury A., Shamszare H. Investigating the impact of user trust on the adoption and use of ChatGPT: survey analysis. J Med Internet Res. 2023 Jun 14;25 - PMC - PubMed
    1. OpenAI platform [Internet]. [cited 2023 Jul 21]. Available from: https://platform.openai.com.
    1. Milmo D. ChatGPT reaches 100 million users two months after launch. The Guardian [Internet] 2023 Feb 2 https://www.theguardian.com/technology/2023/feb/02/chatgpt-100-million-u... [cited 2023 Jul 21]; Available from:
    1. Yenduri G., M R., G C.S., et al. Generative pre-trained transformer: a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions [Internet] arXiv. 2023 http://arxiv.org/abs/2305.10435 [cited 2024 Apr 18]. Available from:
    1. Jagdishbhai N, Thakkar KY. Exploring the capabilities and limitations of GPT and Chat GPT in natural language processing. J Manag Res Anal 10(1):18–20.

LinkOut - more resources