. 2024 Sep;11(5):782-789.

doi: 10.1097/UPJ.0000000000000599. Epub 2024 May 31.

Exploring the Feasibility of GPT-4 as a Data Extraction Tool for Renal Surgery Operative Notes

Jessica Y Hsueh¹, Daniel Nethala¹, Shiva Singh², Jason A Hyman¹, David G Gelikman³, W Marston Linehan¹, Mark W Ball¹

Affiliations

¹ Urologic Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.
² Radiology and Imaging Services, Clinical Center, National Institutes of Health, Bethesda, Maryland.
³ Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.

PMID: 38913566
PMCID: PMC11335444
DOI: 10.1097/UPJ.0000000000000599

Exploring the Feasibility of GPT-4 as a Data Extraction Tool for Renal Surgery Operative Notes

Jessica Y Hsueh et al. Urol Pract. 2024 Sep.

. 2024 Sep;11(5):782-789.

doi: 10.1097/UPJ.0000000000000599. Epub 2024 May 31.

Authors

Jessica Y Hsueh¹, Daniel Nethala¹, Shiva Singh², Jason A Hyman¹, David G Gelikman³, W Marston Linehan¹, Mark W Ball¹

Affiliations

¹ Urologic Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.
² Radiology and Imaging Services, Clinical Center, National Institutes of Health, Bethesda, Maryland.
³ Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.

PMID: 38913566
PMCID: PMC11335444
DOI: 10.1097/UPJ.0000000000000599

Abstract

Introduction: GPT-4 is a large language model with potential for multiple applications in urology. Our study sought to evaluate GPT-4's performance in data extraction from renal surgery operative notes.

Methods: GPT-4 was queried to extract information on laterality, surgery, approach, estimated blood loss, and ischemia time from deidentified operative notes. Match rates were determined by the number of "matched" data points between GPT-4 and human-curated extraction. Accuracy rates were calculated after manually reviewing "not matched" data points. Cohen's kappa and the intraclass coefficient were used to evaluate interrater agreement/reliability.

Results: Our cohort consisted of 1498 renal surgeries from 2003 to 2023. Match rates were high for laterality (94.4%), surgery (92.5%), and approach (89.4%), but lower for estimated blood loss (77.1%) and ischemia time (25.6%). GPT-4 was more accurate for estimated blood loss (90.3% vs 85.5% human curated) and similarly accurate for laterality (95.2% vs 95.3% human curated). Human-curated accuracy rates were higher for surgery (99.3% vs 93% GPT-4), approach (97.9% vs 90.8% GPT-4), and ischemia time (95.6% vs 30.7% GPT-4). Cohen's kappa was 0.96 for laterality, 0.83 for approach, and 0.71 for surgery. The intraclass coefficient was 0.62 for estimated blood loss and 0.09 for ischemia time.

Conclusions: Match and accuracy rates were higher for categorical variables. GPT-4 data extraction was particularly error prone for variables with heterogenous documentation styles. The role of a standard operative template to aid data extraction will be explored in the future. GPT-4 can be utilized as a helpful and efficient data extraction tool with manual feedback.

Keywords: artificial intelligence; kidney cancer; natural language processing.

PubMed Disclaimer

Conflict of interest statement

Disclosure Statement: None

Figures

**Figure 1:**
Examples of zero-shot prompts inputted into GPT-4 for the initial data extraction.

**Figure 2:**
Examples of few-shot prompts given to GPT-4.

**Figure 3:**
GPT-4 explaining how it is conducting the data extraction.

**Figure 4:**
Examples of GPT-4 modifying its extraction logic after manual feedback.

See this image and copyright information in PMC

Comment in

Editorial Commentary.
Gill BC. Gill BC. Urol Pract. 2024 Sep;11(5):788-789. doi: 10.1097/UPJ.0000000000000624. Epub 2024 Jun 12. Urol Pract. 2024. PMID: 38913560 No abstract available.
Editorial Commentary.
Seth RA, Averch TD. Seth RA, et al. Urol Pract. 2024 Sep;11(5):789. doi: 10.1097/UPJ.0000000000000613. Epub 2024 Jun 5. Urol Pract. 2024. PMID: 38913561 No abstract available.

References

1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930–1940. doi: 10.1038/s41591-023-02448-8 - DOI - PubMed
1. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. arXiv. 2017. doi: 10.48550/arXiv.1706.03762. Preprint posted online May 17, 2017. - DOI
1. Tong A Exclusive: ChatGPT traffic slips again for third month in a row. Reuters. https://www.reuters.com/technology/chatgpt-traffic-slips-again-third-mon.... Published September 7, 2023.
1. OpenAI ChatGPT: optimizing language models for dialogue. OpenAI. 2022. Nov 30, [2022-12-22]. https://openai.com/blog/chatgpt/
1. Open AI: GPT-4. OpenAI. 2023. March 14. https://openai.com/research/gpt-4

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

Z99 CL999999/ImNIH/Intramural NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Exploring the Feasibility of GPT-4 as a Data Extraction Tool for Renal Surgery Operative Notes

Affiliations

Exploring the Feasibility of GPT-4 as a Data Extraction Tool for Renal Surgery Operative Notes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources