A systematic mapping review on the capability of large language models in drug-drug interaction analysis
- PMID: 40999995
- DOI: 10.1080/17512433.2025.2568090
A systematic mapping review on the capability of large language models in drug-drug interaction analysis
Abstract
Background: Drug-drug interaction (DDI) is a global health concern affecting patient safety and treatment outcomes. Large language models (LLMs), such as ChatGPT, offer accessible alternatives; however, their effectiveness in DDI analysis remains unclear. This review evaluates the current evidence on the performance of LLM-based chatbots in identifying DDIs.
Methods: A PRISMA-compliant systematic review (PROSPERO: CRD420251020360) was conducted using PubMed, Scopus, and Web of Science (studies published between 1 January 2015, and 31 March 2025). Eligible studies included those using publicly accessible LLM chatbots for DDI detection.
Results: Nine studies (2023-2025) evaluated publicly accessible LLM chatbots, including ChatGPT, Bing AI, and Google Bard, for DDI identification. Methods varied from patient-level polypharmacy screening to single-drug checks and case vignettes. Chatbot performance was inconsistent: ChatGPT identified many potential DDIs, with ChatGPT-4.0 generally identifying more potential DDIs, but with variable accuracy, while Bing AI and Google Bard were less reliable.
Conclusion: Publicly accessible LLM chatbots demonstrate variable and partial effectiveness in detecting DDIs. There is a clear need to develop dedicated, freely available chatbots designed specifically for DDI identification. Future research should focus on standardizing evaluation methods and expanding access to improve medication safety in clinical practice.
Prospero: CRD420251020360.
Keywords: Large language models; artificial intelligence; chatGPT; chatbot; drug–drug interactions.
Plain language summary
Taking many medicines at once (polypharmacy) can lead to drug-drug interactions (DDIs), where one drug affects how another works, causing side effects or reducing treatment success. Detecting DDIs is important, but it often relies on costly tools or expert knowledge, which may not be readily available in all settings. This study looked at how well public AI chatbots like ChatGPT, Bing AI, and Google Bard identify DDIs. Their performance was inconsistent across different chatbots and not reliable enough for medical use. Further research is needed to comment on their safety and accuracy.
Publication types
LinkOut - more resources
Full Text Sources