Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Sep 29:1-8.
doi: 10.1080/17512433.2025.2568090. Online ahead of print.

A systematic mapping review on the capability of large language models in drug-drug interaction analysis

Affiliations
Review

A systematic mapping review on the capability of large language models in drug-drug interaction analysis

Himel Mondal et al. Expert Rev Clin Pharmacol. .

Abstract

Background: Drug-drug interaction (DDI) is a global health concern affecting patient safety and treatment outcomes. Large language models (LLMs), such as ChatGPT, offer accessible alternatives; however, their effectiveness in DDI analysis remains unclear. This review evaluates the current evidence on the performance of LLM-based chatbots in identifying DDIs.

Methods: A PRISMA-compliant systematic review (PROSPERO: CRD420251020360) was conducted using PubMed, Scopus, and Web of Science (studies published between 1 January 2015, and 31 March 2025). Eligible studies included those using publicly accessible LLM chatbots for DDI detection.

Results: Nine studies (2023-2025) evaluated publicly accessible LLM chatbots, including ChatGPT, Bing AI, and Google Bard, for DDI identification. Methods varied from patient-level polypharmacy screening to single-drug checks and case vignettes. Chatbot performance was inconsistent: ChatGPT identified many potential DDIs, with ChatGPT-4.0 generally identifying more potential DDIs, but with variable accuracy, while Bing AI and Google Bard were less reliable.

Conclusion: Publicly accessible LLM chatbots demonstrate variable and partial effectiveness in detecting DDIs. There is a clear need to develop dedicated, freely available chatbots designed specifically for DDI identification. Future research should focus on standardizing evaluation methods and expanding access to improve medication safety in clinical practice.

Prospero: CRD420251020360.

Keywords: Large language models; artificial intelligence; chatGPT; chatbot; drug–drug interactions.

Plain language summary

Taking many medicines at once (polypharmacy) can lead to drug-drug interactions (DDIs), where one drug affects how another works, causing side effects or reducing treatment success. Detecting DDIs is important, but it often relies on costly tools or expert knowledge, which may not be readily available in all settings. This study looked at how well public AI chatbots like ChatGPT, Bing AI, and Google Bard identify DDIs. Their performance was inconsistent across different chatbots and not reliable enough for medical use. Further research is needed to comment on their safety and accuracy.

PubMed Disclaimer

LinkOut - more resources