Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 17;13(1):52.
doi: 10.1007/s13755-025-00368-0. eCollection 2025 Dec.

Analyses of different prescriptions for health using artificial intelligence: a critical approach based on the international guidelines of health institutions

Affiliations

Analyses of different prescriptions for health using artificial intelligence: a critical approach based on the international guidelines of health institutions

Vítor Marcelo Soares Campos et al. Health Inf Sci Syst. .

Abstract

Purpose: Large-language models (LLMs) are increasingly used for health advice, but their alignment with evidence-based guidelines and sensitivity to question phrasing remain unclear.

Methods: In May 2025, we evaluated ChatGPT 4.0, ChatGPT 4.5, and DeepSeek V3 using four clinical vignettes: major depression with polysubstance use, irritable bowel syndrome flare, new-onset hypertension requiring exercise counseling, and chronic low back pain. Each scenario was tested with clinician- and patient-style prompts, generating 24 responses. Outputs were benchmarked against 89 guideline-derived recommendations from three authoritative sources per domain. Two blinded reviewers scored concordance (1 = actionable detail, 0.5 = generic mention, 0 = absent), with adjudication by a third reviewer. Inter-rater reliability was measured using Cronbach's α.

Results: ChatGPT 4.5 achieved the highest guideline concordance (61.9%), followed by DeepSeek V3 (60.7%) and ChatGPT 4.0 (53.7%). Performance varied by domain, exceeding 67% in mental health but dropping below 45% in nutrition. Prompt phrasing influenced capture rates, with clinician-style prompts improving scores in exercise and pain domains, while patient-style prompts outperformed in nutrition. Reviewer agreement was high (α = 0.97 for chatbot scoring; 0.80 for matrix coding).

Conclusion: LLMs can rapidly generate draft care plans that reflect clinical guidelines, though they favor generic over individualized advice. By introducing a unique, domain-agnostic scoring rubric that aligns AI-generated 30-day care plans with gold-standard guidelines, and by applying it in parallel to mental health, nutrition, exercise, and physical therapy scenarios, our study delivers the first prompt-sensitive audit showing where current LLMs exceed, match, or fall short of multidisciplinary best practices.

Supplementary information: The online version contains supplementary material available at 10.1007/s13755-025-00368-0.

Keywords: Artificial intelligence; Chatbots in healthcare; Digital health; Machine learning; Personalized medicine.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestThe authors declare no conflicts of interest.

References

    1. Dergaa I, Ben Saad H, El Omri A, Glenn J, Clark C, Washif J, et al. Using artificial intelligence for exercise prescription in personalised health promotion: a critical evaluation of OpenAI’s GPT-4 model. Biol Sport. 2024;41:221–41. - PMC - PubMed
    1. Dergaa I, Chamari K, Zmijewski P, Ben Saad H. From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biol Sport. 2023;40:615–22. - PMC - PubMed
    1. Michael L. Littman. Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report. 2021.
    1. Aung YYM, Wong DCS, Ting DSW. The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare. Br Med Bull. 2021;139:4–15. - PubMed
    1. Pavlik JV. Collaborating with ChatGPT: considering the implications of generative artificial intelligence for journalism and media education. Journalism Mass Commun Educator. 2023;78:84–93.

LinkOut - more resources