ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study

Lindsey Finch¹, Vance Broach², Jacqueline Feinberg², Ahmed Al-Niaimi², Nadeem R Abu-Rustum², Qin Zhou³, Alexia Iasonos³, Dennis S Chi⁴

Affiliations

¹ Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
² Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA.
³ Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁴ Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA. Electronic address: chid@mskcc.org.

PMID: 39042956
PMCID: PMC11402584
DOI: 10.1016/j.ygyno.2024.07.007

Comparative Study

ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study

Lindsey Finch et al. Gynecol Oncol. 2024 Oct.

. 2024 Oct:189:75-79.

doi: 10.1016/j.ygyno.2024.07.007. Epub 2024 Jul 22.

Authors

Lindsey Finch¹, Vance Broach², Jacqueline Feinberg², Ahmed Al-Niaimi², Nadeem R Abu-Rustum², Qin Zhou³, Alexia Iasonos³, Dennis S Chi⁴

Affiliations

¹ Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
² Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA.
³ Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁴ Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA. Electronic address: chid@mskcc.org.

PMID: 39042956
PMCID: PMC11402584
DOI: 10.1016/j.ygyno.2024.07.007

Abstract

Objectives: We evaluated the performance of a chatbot compared to the National Comprehensive Cancer Network (NCCN) Guidelines for the management of ovarian cancer.

Methods: Using NCCN Guidelines, we generated 10 questions and answers regarding management of ovarian cancer at a single point in time. Questions were thematically divided into risk factors, surgical management, medical management, and surveillance. We asked ChatGPT (GPT-4) to provide responses without prompting (unprompted GPT) and with prompt engineering (prompted GPT). Responses were blinded and evaluated for accuracy and completeness by 5 gynecologic oncologists. A score of 0 was defined as inaccurate, 1 as accurate and incomplete, and 2 as accurate and complete. Evaluations were compared among NCCN, unprompted GPT, and prompted GPT answers.

Results: Overall, 48% of responses from NCCN, 64% from unprompted GPT, and 66% from prompted GPT were accurate and complete. The percentage of accurate but incomplete responses was higher for NCCN vs GPT-4. The percentage of accurate and complete scores for questions regarding risk factors, surgical management, and surveillance was higher for GPT-4 vs NCCN; however, for questions regarding medical management, the percentage was lower for GPT-4 vs NCCN. Overall, 14% of responses from unprompted GPT, 12% from prompted GPT, and 10% from NCCN were inaccurate.

Conclusions: GPT-4 provided accurate and complete responses at a single point in time to a limited set of questions regarding ovarian cancer, with best performance in areas of risk factors, surgical management, and surveillance. Occasional inaccuracies, however, should limit unsupervised use of chatbots at this time.

Keywords: Artificial intelligence; Large language models; Ovarian cancer.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest Dr. Abu-Rustum reports research funding paid to the institution from GRAIL. Memorial Sloan Kettering Cancer Center also has equity in GRAIL. Dr. Chi reports medical advisory board participation for Verthermia Acquio and Biom ‘Up Inc., speaker fees from AstraZeneca, and stock in BioNTech and Doximity.

References

1. Iannantuono GM, et al. , Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol, 2023. 13: p. 1268915. - PMC - PubMed
1. Shen Y, et al. , ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology, 2023. 307(2): p. e230163. - PubMed
1. Egli A, ChatGPT, GPT-4, and Other Large Language Models: The Next Revolution for Clinical Microbiology? Clin Infect Dis, 2023. 77(9): p. 1322–1328. - PMC - PubMed
1. ADEPT, ACT-1: Transformer for Actions-Adept.
1. Josh Achiam, Sandhini Agarwal SA, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko, Madelaine Boyd, Anna-Luisa Brakman, Greg Brockman, Tim Brooks, Miles Brundage, Kevin Button, Trevor Cai, Rosie Campbell, Andrew Cann, Brittany Carey, Chelsea Carlson, Rory Carmichael, Brooke Chan, Che Chang, Fotis Chantzis, Derek Chen, Sully Chen, Ruby Chen, Jason Chen, Mark Chen, Ben Chess, Chester Cho, Casey Chu, Hyung Won Chung, Dave Cummings, Jeremiah Currier, Yunxing Dai, Cory Decareaux, Thomas Degry, Noah Deutsch, Damien Deville, Arka Dhar, David Dohan, Steve Dowling, Sheila Dunning, Adrien Ecoffet, Atty Eleti, Tyna Eloundou, David Farhi, Liam Fedus, Niko Felix, Simón Posada Fishman, Juston Forte, Isabella Fulford, Leo Gao, Elie Georges, Christian Gibson, Vik Goel, Tarun Gogineni, Gabriel Goh, Rapha Gontijo-Lopes, Jonathan Gordon, Morgan Grafstein, Scott Gray, Ryan Greene, Joshua Gross, Shixiang Shane Gu, Yufei Guo, Chris Hallacy, Jesse Han, Jeff Harris, Yuchen He, Mike Heaton, Johannes Heidecke, Chris Journal Pre-proof Hesse, Alan Hickey, Wade Hickey, Peter Hoeschele, Brandon Houghton, Kenny Hsu, Shengli Hu, Xin Hu, Joost Huizinga, Shantanu Jain, Shawn Jain, GPT-4 Technical Report. 2024.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

P30 CA008748/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study

Affiliations

ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study

Authors

Affiliations

Abstract

Conflict of interest statement

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical