Cross sectional pilot study on clinical review generation using large language models

Zining Luo^{1

2

3

4

5}, Yang Qiao^#⁶, Xinyu Xu^#⁵, Xiangyu Li^#⁵, Mengyan Xiao^#⁵, Aijia Kang^#⁷, Dunrui Wang^#^{2

3}, Yueshan Pang^#⁸, Xing Xie^#^{1

9}, Sijun Xie^#^{1

9}, Dachen Luo^#¹⁰, Xuefeng Ding^#¹¹, Zhenglong Liu^#⁴, Ying Liu^#⁵, Aimin Hu¹², Yixing Ren^{13

14}, Jiebin Xie^{15

16}

Affiliations

¹ Department of Gastrointestinal Surgery, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
² School of Electronics & Electrical Engineering, University of Glasgow, Glasgow, Scotland, UK.
³ School of Information & Communication Engineering, University of Electronic Science & Technology, Chengdu, Sichuan, China.
⁴ School of Basic Medicine and School of Forensic Medicine, North Sichuan Medical College, Nanchong, Sichuan, China.
⁵ Department of Stomatology, North Sichuan Medical College, Nanchong, Sichuan, China.
⁶ Department of Biomedical Engineering, North Sichuan Medical College, Nanchong, Sichuan, China.
⁷ Department of Aesthesia, North Sichuan Medical College, Nanchong, Sichuan, China.
⁸ Department of Geriatrics, The Second Clinical Medical College of North Sichuan Medical College, Nanchong Central Hospital, Nanchong, Sichuan, China.
⁹ Department of General Surgery, and Institute of Hepato-Biliary-Pancreas and Intestinal Disease, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
¹⁰ Department of Respiratory and Critical Care Medicine, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
¹¹ Department of Critical Care Medicine, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
¹² Department of Foreign Languages and Culture, North Sichuan Medical College, Nanchong, Sichuan, China.
¹³ Department of Gastrointestinal Surgery, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China. yixingren@nsmc.edu.cn.
¹⁴ Department of General Surgery, and Institute of Hepato-Biliary-Pancreas and Intestinal Disease, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China. yixingren@nsmc.edu.cn.
¹⁵ Department of Gastrointestinal Surgery, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China. xiejiebin84@126.com.
¹⁶ Department of General Surgery, and Institute of Hepato-Biliary-Pancreas and Intestinal Disease, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China. xiejiebin84@126.com.

^# Contributed equally.

PMID: 40108444
PMCID: PMC11923074
DOI: 10.1038/s41746-025-01535-z

Cross sectional pilot study on clinical review generation using large language models

Zining Luo et al. NPJ Digit Med. 2025.

. 2025 Mar 19;8(1):170.

doi: 10.1038/s41746-025-01535-z.

Authors

Affiliations

¹ Department of Gastrointestinal Surgery, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
² School of Electronics & Electrical Engineering, University of Glasgow, Glasgow, Scotland, UK.
³ School of Information & Communication Engineering, University of Electronic Science & Technology, Chengdu, Sichuan, China.
⁴ School of Basic Medicine and School of Forensic Medicine, North Sichuan Medical College, Nanchong, Sichuan, China.
⁵ Department of Stomatology, North Sichuan Medical College, Nanchong, Sichuan, China.
⁶ Department of Biomedical Engineering, North Sichuan Medical College, Nanchong, Sichuan, China.
⁷ Department of Aesthesia, North Sichuan Medical College, Nanchong, Sichuan, China.
⁸ Department of Geriatrics, The Second Clinical Medical College of North Sichuan Medical College, Nanchong Central Hospital, Nanchong, Sichuan, China.
⁹ Department of General Surgery, and Institute of Hepato-Biliary-Pancreas and Intestinal Disease, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
¹⁰ Department of Respiratory and Critical Care Medicine, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
¹¹ Department of Critical Care Medicine, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
¹² Department of Foreign Languages and Culture, North Sichuan Medical College, Nanchong, Sichuan, China.
¹³ Department of Gastrointestinal Surgery, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China. yixingren@nsmc.edu.cn.
¹⁴ Department of General Surgery, and Institute of Hepato-Biliary-Pancreas and Intestinal Disease, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China. yixingren@nsmc.edu.cn.
¹⁵ Department of Gastrointestinal Surgery, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China. xiejiebin84@126.com.
¹⁶ Department of General Surgery, and Institute of Hepato-Biliary-Pancreas and Intestinal Disease, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China. xiejiebin84@126.com.

^# Contributed equally.

PMID: 40108444
PMCID: PMC11923074
DOI: 10.1038/s41746-025-01535-z

Abstract

As the volume of medical literature accelerates, necessitating efficient tools to synthesize evidence for clinical practice and research, the interest in leveraging large language models (LLMs) for generating clinical reviews has surged. However, there are significant concerns regarding the reliability associated with integrating LLMs into the clinical review process. This study presents a systematic comparison between LLM-generated and human-authored clinical reviews, revealing that while AI can quickly produce reviews, it often has fewer references, less comprehensive insights, and lower logical consistency while exhibiting lower authenticity and accuracy in their citations. Additionally, a higher proportion of its references are from lower-tier journals. Moreover, the study uncovers a concerning inefficiency in current detection systems for identifying AI-generated content, suggesting a need for more advanced checking systems and a stronger ethical framework to ensure academic transparency. Addressing these challenges is vital for the responsible integration of LLMs into clinical research.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

**Fig. 1. The analysis results of the three indicators.**
The boxplot illustrates the data distribution: the box represents the interquartile range (IQR) from the first quartile (Q1) to the third quartile (Q3), with the line inside indicating the median and a square symbol marking the mean. The whiskers extend up to 1.5 times the IQR, and any points beyond this range are marked as outliers. In terms of objective metrics, AI demonstrates lower paragraph count, number of references, comprehensiveness, authenticity, and accuracy compared to humans. On subjective metrics, AI performs worse than humans across all levels. However, there is no significant difference between the two in terms of the cumulative and the average citation count of references, while the references exhibit different distribution patterns.

**Fig. 2. The result of plagiarism checks.**
AI demonstrates a low plagiarism checking rate.

**Fig. 3. The result of AIGC detection tests.**
The boxplot illustrates the data distribution: the box represents the IQR from the Q1 to the Q3, with the line inside indicating the median and a square symbol marking the mean. The whiskers extend up to 1.5 times the IQR, and any points beyond this range are marked as outliers. On the left side of the boxplot, a scatterplot displays the distribution of the data points. Among all the submitted articles, AI exhibited a high variability in detection rates and a high detection rate.

**Fig. 4. The comparison results of various AIGC detection platforms before and after using Merlin to reduce AIGC detection rates.**
The results indicate that after using this tool, the AIGC detection rates across all platforms decreased, with reductions ranging from 21% to 82%. For most articles, the AIGC detection rate dropped below 50% (the threshold at which all platforms classify content as AI-generated).

**Fig. 5. Supplementary research directions for the specified application-oriented research in the future.**
The workflow diagram above illustrates how to conduct supplementary investigations in designated research areas, particularly those that rely on a limited number of authoritative references.

**Fig. 6. Flowchart of the overall study design.**
After determining the themes of the articles in the journal, generate clinical reviews and then evaluate them based on the Basic quality of the article, Distribution of references, Quality of references, and Academic publishing risk.

See this image and copyright information in PMC

References

1. Rita, G-M, Luca, S., Benjamin, M. S., Philipp, B. & Dmitry, K. The landscape of biomedical research. bioRxiv (2024).
1. Literature Review and Synthesis Implications on Healthcare Research, Practice, Policy, and Public Messaging. (Springer Publishing Company, New York, NY, 2022).
1. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med.29, 1930–1940 (2023). - PubMed
1. The New York Times. How ChatGPT Kicked Off an A.I. Arms Race. (https://www.nytimes.com/2023/02/03/technology/chatgpt-openai-artificial-...) (2023).
1. Large Language Model Market Size, Share & Trends Analysis Report By Application (Customer Service, Content Generation), By Deployment, By Industry Vertical, By Region, And Segment Forecasts, 2024 - 2030. (https://www.grandviewresearch.com/industry-analysis/large-language-model...) (2024).

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cross sectional pilot study on clinical review generation using large language models

Affiliations

Cross sectional pilot study on clinical review generation using large language models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Research Materials