Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?

Frédéric Panthier^{1

2

3

4}, Hugh Crawford-Smith⁵, Eduarda Alvarez^{5

6}, Alberto Melchionna⁵, Daniela Velinova⁵, Ikran Mohamed⁵, Siobhan Price⁵, Simon Choong⁵, Vimoshan Arumuham⁵, Sian Allen⁵, Olivier Traxer^{7

8

9}, Daron Smith^{5

10

11}

Affiliations

¹ Department of Urology, Westmoreland Street Hospital, UCLH NHS Foundation Trust, 16-18 Westmoreland Street, Marylebone, London, W1G 8PH, UK. fredericpanthier@gmail.com.
² Sorbonne University GRC Urolithiasis No. 20 Tenon Hospital, 75020, Paris, France. fredericpanthier@gmail.com.
³ Progressive Endourological Association for Research and Leading Solutions (PEARLS), Paris, France. fredericpanthier@gmail.com.
⁴ PIMM, UMR 8006 CNRS-Arts et Métiers ParisTech, 151 Bd de L'Hôpital, 75013, Paris, France. fredericpanthier@gmail.com.
⁵ Department of Urology, Westmoreland Street Hospital, UCLH NHS Foundation Trust, 16-18 Westmoreland Street, Marylebone, London, W1G 8PH, UK.
⁶ Escola Bahiana de Medicina e Saúde Pública, Av. Dom João VI, 275, Salvador, BA, Brazil.
⁷ Sorbonne University GRC Urolithiasis No. 20 Tenon Hospital, 75020, Paris, France.
⁸ Progressive Endourological Association for Research and Leading Solutions (PEARLS), Paris, France.
⁹ PIMM, UMR 8006 CNRS-Arts et Métiers ParisTech, 151 Bd de L'Hôpital, 75013, Paris, France.
¹⁰ Endourology Academy, London, UK.
¹¹ Social Media Committee, Endourological Society, New York, USA.

PMID: 39466443
DOI: 10.1007/s00345-024-05311-8

Comparative Study

Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?

Frédéric Panthier et al. World J Urol. 2024.

. 2024 Oct 28;42(1):598.

doi: 10.1007/s00345-024-05311-8.

Authors

Affiliations

¹ Department of Urology, Westmoreland Street Hospital, UCLH NHS Foundation Trust, 16-18 Westmoreland Street, Marylebone, London, W1G 8PH, UK. fredericpanthier@gmail.com.
² Sorbonne University GRC Urolithiasis No. 20 Tenon Hospital, 75020, Paris, France. fredericpanthier@gmail.com.
³ Progressive Endourological Association for Research and Leading Solutions (PEARLS), Paris, France. fredericpanthier@gmail.com.
⁴ PIMM, UMR 8006 CNRS-Arts et Métiers ParisTech, 151 Bd de L'Hôpital, 75013, Paris, France. fredericpanthier@gmail.com.
⁵ Department of Urology, Westmoreland Street Hospital, UCLH NHS Foundation Trust, 16-18 Westmoreland Street, Marylebone, London, W1G 8PH, UK.
⁶ Escola Bahiana de Medicina e Saúde Pública, Av. Dom João VI, 275, Salvador, BA, Brazil.
⁷ Sorbonne University GRC Urolithiasis No. 20 Tenon Hospital, 75020, Paris, France.
⁸ Progressive Endourological Association for Research and Leading Solutions (PEARLS), Paris, France.
⁹ PIMM, UMR 8006 CNRS-Arts et Métiers ParisTech, 151 Bd de L'Hôpital, 75013, Paris, France.
¹⁰ Endourology Academy, London, UK.
¹¹ Social Media Committee, Endourological Society, New York, USA.

PMID: 39466443
DOI: 10.1007/s00345-024-05311-8

Abstract

Purpose: To compare the accuracy of open-source Artificial Intelligence (AI) Large Language Models (LLM) against human authors to generate a systematic review (SR) on the new pulsed-Thulium:YAG (p-Tm:YAG) laser.

Methods: Five manuscripts were compared. The Human-SR on p-Tm:YAG (considered to be the "ground truth") was written by independent certified endourologists with expertise in lasers, accepted in a peer-review pubmed-indexed journal (but not yet available online, and therefore not accessible to the LLMs). The query to the AI LLMs was: "write a systematic review on pulsed-Thulium:YAG laser for lithotripsy" which was submitted to four LLMs (ChatGPT3.5/Vercel/Claude/Mistral-7b). The LLM-SR were uniformed and Human-SR reformatted to fit the general output appearance, to ensure blindness. Nine participants with various levels of endourological expertise (three Clinical Nurse Specialist nurses, Urology Trainees and Consultants) objectively assessed the accuracy of the five SRs using a bespoke 10 "checkpoint" proforma. A subjective assessment was recorded using a composite score including quality (0-10), clarity (0-10) and overall manuscript rank (1-5).

Results: The Human-SR was objectively and subjectively more accurate than LLM-SRs (96 ± 7% and 86.8 ± 8.2% respectively; p < 0.001). The LLM-SRs did not significantly differ but ChatGPT3.5 presented greater subjective and objective accuracy scores (62.4 ± 15% and 29 ± 28% respectively; p > 0.05). Quality and clarity assessments were significantly impacted by SR type but not the expertise level (p < 0.001 and > 0.05, respectively).

Conclusions: LLM generated data on highly technical topics present a lower accuracy than Key Opinion Leaders. LLMs, especially ChatGPT3.5, with human supervision could improve our practice.

Keywords: Artificial intelligence; Laser; Lithotripsy; Machine learning; Neural network; Urinary stones.

PubMed Disclaimer

References

1. Turing AM. I.—Computing machinery and intelligence. Mind. 1950;LIX(236):433‑60.
1. Yu P, Xu H, Hu X, Deng C (2023) Leveraging generative AI and large language models: a comprehensive roadmap for healthcare integration. Healthc Basel Switz 11(20):2776
1. Stamatelou K, Goldfarb DS (2023) Epidemiology of kidney stones. Healthc Basel Switz. 11(3):424
1. Brikowski TH, Lotan Y, Pearle MS (2008) Climate-related increase in the prevalence of urolithiasis in the United States. Proc Natl Acad Sci 105(28):9841–9846 - DOI - PubMed - PMC
1. Stamatelou K, Goldfarb DS (2023) Epidemiology of Kidney Stones. Healthcare 11(3):424 - DOI - PubMed - PMC

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Springer
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?

Affiliations

Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?

Authors

Affiliations

Abstract

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials