Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 20:15:137-147.
doi: 10.2147/DHPS.S425858. eCollection 2023.

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools

Affiliations

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools

Fahmi Y Al-Ashwal et al. Drug Healthc Patient Saf. .

Abstract

Background: AI platforms are equipped with advanced ‎algorithms that have the potential to offer a wide range of ‎applications in healthcare services. However, information about the accuracy of AI chatbots against ‎conventional drug-drug interaction tools is limited‎. This study aimed to assess the sensitivity, specificity, and accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard in predicting drug-drug interactions.

Methods: AI-based chatbots (ie, ChatGPT-3.5, ChatGPT-4, Microsoft Bing AI, and Google Bard) were compared for their abilities to detect clinically relevant DDIs for 255 drug pairs. Descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV), were calculated for each tool.

Results: When a subscription tool was used as a reference, the specificity ranged from a low of 0.372 (ChatGPT-3.5) to a high of 0.769 (Microsoft Bing AI). Also, Microsoft Bing AI had the highest performance with an accuracy score of 0.788, with ChatGPT-3.5 having the lowest accuracy rate of 0.469. There was an overall improvement in performance for all the programs when the reference tool switched to a free DDI source, but still, ChatGPT-3.5 had the lowest specificity (0.392) and accuracy (0.525), and Microsoft Bing AI demonstrated the highest specificity (0.892) and accuracy (0.890). When assessing the consistency of accuracy across two different drug classes, ChatGPT-3.5 and ChatGPT-4 showed the highest ‎variability in accuracy. In addition, ChatGPT-3.5, ChatGPT-4, and Bard exhibited the highest ‎fluctuations in specificity when analyzing two medications belonging to the same drug class.

Conclusion: Bing AI had the highest accuracy and specificity, outperforming Google's Bard, ChatGPT-3.5, and ChatGPT-4. The findings highlight the significant potential these AI tools hold in transforming patient care. While the current AI platforms evaluated are not without limitations, their ability to quickly analyze potentially significant interactions with good sensitivity suggests a promising step towards improved patient safety.

Keywords: Bard‎; Bing AI; ChatGPT; accuracy; drug-drug interaction; patient safety; sensitivity; specificity.

PubMed Disclaimer

Conflict of interest statement

The authors report no conflicts of interest in this work.‎

Figures

Figure 1
Figure 1
The number of DDIs identified by each database categorized by the severity of the interaction. The colors of the bars represent the different levels of severity of the drug-drug interactions (DDIs). Abbreviation: DDIs, drug-drug interactions.
Figure 2
Figure 2
Accuracy of AI tools to detect DDIs categorized by two drug classes (drugs.com as a standard). Abbreviations: DDIs, drug-drug interactions; SGLT2i, sodium-glucose co-transporter 2 inhibitors.
Figure 3
Figure 3
Specificity difference of the AI tools to detect DDIs of two medications within the same class (drugs.com as a standard). Abbreviation: DDIs, drug-drug interactions.

References

    1. Mhlanga D. Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. Education, the Responsible and Ethical Use of ChatGPT Towards Lifelong Learning (February 11, 2023); 2023.
    1. Vaishya R, Misra A, Vaish A. ChatGPT: is this version good for healthcare and research? Diabetes Metab Syndr. 2023;17(4):102744. doi: 10.1016/j.dsx.2023.102744 - DOI - PubMed
    1. Hauben M. Artificial intelligence and data mining for the pharmacovigilance of drug–drug interactions. Clin Ther. 2023;45(2):117–133. doi: 10.1016/j.clinthera.2023.01.002 - DOI - PubMed
    1. Haluza D, Jungwirth D. Artificial intelligence and ten societal megatrends: an exploratory study using GPT-3. Systems. 2023;11(3):120. doi: 10.3390/systems11030120 - DOI
    1. Jain S, Naicker D, Raj R, et al. Computational intelligence in cancer diagnostics: a contemporary review of smart phone apps, current problems, and future research potentials. Diagnostics. 2023;13(9):1563. doi: 10.3390/diagnostics13091563 - DOI - PMC - PubMed