Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov;136(11):1119-1123.e18.
doi: 10.1016/j.amjmed.2023.08.003. Epub 2023 Aug 27.

Comparative Evaluation of Diagnostic Accuracy Between Google Bard and Physicians

Affiliations

Comparative Evaluation of Diagnostic Accuracy Between Google Bard and Physicians

Takanobu Hirosawa et al. Am J Med. 2023 Nov.

Abstract

Background: In this study, we evaluated the diagnostic accuracy of Google Bard, a generative artificial intelligence (AI) platform.

Methods: We searched published case reports from our department for difficult or uncommon case descriptions and mock cases created by physicians for common case descriptions. We entered the case descriptions into the prompt of Google Bard to generate the top 10 differential-diagnosis lists. As in previous studies, other physicians created differential-diagnosis lists by reading the same clinical descriptions.

Results: A total of 82 clinical descriptions (52 case reports and 30 mock cases) were used. The accuracy rates of physicians were still higher than Google Bard in the top 10 (56.1% vs 82.9%, P < .001), the top 5 (53.7% vs 78.0%, P = .002), and the top differential diagnosis (40.2% vs 64.6%, P = .003). Even within the specific context of case reports, physicians consistently outperformed Google Bard. When it came to mock cases, the performances of the differential-diagnosis lists by Google Bard were no different from those of the physicians in the top 10 (80.0% vs 96.6%, P = .11) and the top 5 (76.7% vs 96.6%, P = .06), except for those in the top diagnoses (60.0% vs 90.0%, P = .02).

Conclusion: While physicians excelled overall, and particularly with case reports, Google Bard displayed comparable diagnostic performance in common cases. This suggested that Google Bard possesses room for further improvement and refinement in its diagnostic capabilities. Generative AIs, including Google Bard, are anticipated to become increasingly beneficial in augmenting diagnostic accuracy.

Keywords: Clinical decision supporting system; Diagnosis; Diagnostic excellence; Generative artificial intelligence; Large language model; Natural language processing.

PubMed Disclaimer

LinkOut - more resources