Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 29:11:20552076251346703.
doi: 10.1177/20552076251346703. eCollection 2025 Jan-Dec.

Harnessing GPT-4 for automated error detection in pathology reports: Implications for oncology diagnostics

Affiliations

Harnessing GPT-4 for automated error detection in pathology reports: Implications for oncology diagnostics

Xiongwen Yang et al. Digit Health. .

Abstract

Objective: Accurate pathology reports are crucial for the diagnosis and treatment planning of cancer patients. However, these reports are prone to errors due to time pressures, subjective interpretation, and inconsistencies among professionals. Addressing these errors is vital for improving oncology care outcomes. Artificial intelligence (AI) systems, such as GPT-4, offer the potential to enhance diagnostic accuracy and efficiency.

Methods: A total of 700 malignant tumor pathology reports were collected from four hospitals. Of these, 350 reports had deliberate errors introduced by a senior pathologist, mimicking real-world reporting challenges. Error detection performance was evaluated by comparing GPT-4 to six human pathologists (two seniors, two attending pathologists, and two residents). Key metrics included error detection rates with Wilson confidence intervals and processing time per report.

Results: GPT-4 detected 88% of errors (350/400; 95% CI: [84, 91]), compared to a 95% detection rate by the top senior pathologist (382/400; 95% CI: [93, 97]). GPT-4 significantly reduced the average processing time to 4.03 seconds per report, compared to 65.64 seconds for the fastest human pathologist. However, GPT-4 exhibited a higher rate of false positives (2.3%; 95% CI: [1.52, 3.01]) compared to the best-performing senior pathologist (0.3%; 95% CI: [0.01, 0.91]).

Conclusions: GPT-4 demonstrates substantial potential in improving the efficiency and accuracy of pathology error detection, which could accelerate clinical workflows and enhance cancer diagnostics. However, its higher false-positive rate emphasizes the need for human oversight to ensure safe implementation in clinical practice.

Keywords: Large language model; artificial intelligence in oncology; cancer diagnostics workflow; diagnostic accuracy; pathology report error detection.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Study flowchart. (A) Initially, 700 original pathology reports from 14 organs were selected. (B) These were then randomized into two sets: a correct set and an incorrect set, each containing 350 pathology reports. Within the incorrect set, 400 errors across five categories (Clerical errors, Improper use of terminology, Missing information, explain or diagnose errors, Data inconsistency) were deliberately introduced, with a maximum of three errors per case. (C) GPT-4 and six doctors were tasked with evaluating each pathology report to identify potential errors, allowing for a comparative analysis of their performance.
Figure 2.
Figure 2.
Bar graph shows the percentage of detected errors for GPT-4 and the doctors. The error bars are 95% CIs.
Figure 3.
Figure 3.
Scatter diagram shows reading time per radiology report in seconds.

Similar articles

References

    1. Ellis DW, Srigley J. Does standardised structured reporting contribute to quality in diagnostic pathology? The importance of evidence-based datasets. Virchows Arch 2016; 468: 51–59. - PubMed
    1. Ahmad Z, Idrees R, Uddin N, et al. Errors in surgical pathology reports: a study from a major center in Pakistan. Asian Pac J Cancer Prev 2016; 17: 1869–1874. - PubMed
    1. Huang S, Lee PV, B J. Errors encountered in the diagnostic pathway: a prospective single-institution study. J Cutan Pathol 2023; 50: 828–834. - PubMed
    1. Monique Freire S, Luiz Carlos de LF. Chapter 7: Errors in surgical pathology laboratory. In: Sarwar Z G. (ed) Quality control in laboratory. Rijeka: IntechOpen, 2018, pp. 89–107.
    1. Yang X, Chu XP, Huang S, et al. A novel image deep learning-based sub-centimeter pulmonary nodule management algorithm to expedite resection of the malignant and avoid over-diagnosis of the benign. Eur Radiol 2024; 34: 2048–2061. 20230902. - PubMed

LinkOut - more resources