Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun;30(6):1574-1582.
doi: 10.1038/s41591-024-02933-8. Epub 2024 Apr 25.

Large language models for preventing medication direction errors in online pharmacies

Affiliations

Large language models for preventing medication direction errors in online pharmacies

Cristobal Pais et al. Nat Med. 2024 Jun.

Abstract

Errors in pharmacy medication directions, such as incorrect instructions for dosage or frequency, can increase patient safety risk substantially by raising the chances of adverse drug events. This study explores how integrating domain knowledge with large language models (LLMs)-capable of sophisticated text interpretation and generation-can reduce these errors. We introduce MEDIC (medication direction copilot), a system that emulates the reasoning of pharmacists by prioritizing precise communication of core clinical components of a prescription, such as dosage and frequency. It fine-tunes a first-generation LLM using 1,000 expert-annotated and augmented directions from Amazon Pharmacy to extract the core components and assembles them into complete directions using pharmacy logic and safety guardrails. We compared MEDIC against two LLM-based benchmarks: one leveraging 1.5 million medication directions and the other using state-of-the-art LLMs. On 1,200 expert-reviewed prescriptions, the two benchmarks respectively recorded 1.51 (confidence interval (CI) 1.03, 2.31) and 4.38 (CI 3.13, 6.64) times more near-miss events-errors caught and corrected before reaching the patient-than MEDIC. Additionally, we tested MEDIC by deploying within the production system of an online pharmacy, and during this experimental period, it reduced near-miss events by 33% (CI 26%, 40%). This study shows that LLMs, with domain expertise and safeguards, improve the accuracy and efficiency of pharmacy operations.

PubMed Disclaimer

Conflict of interest statement

All authors conducted this research during their employment at Amazon Pharmacy, Amazon’s prescription medication service, which facilitates customers in ordering prescription medications for home delivery; however, the authors did not receive any financial incentives for any activity related to conducting this research and publishing its findings.

Figures

Fig. 1
Fig. 1. High-level overview of the study and pharmacy workflow.
a, Schematic of the pharmacy workflow used, highlighting the occurrence of near-miss events, the primary metric of the prospective evaluation. b, Examples of pairs of prescriber medications directions and their corresponding, pharmacist verification (PV) equivalents. This process occurs in the DE technician step as highlighted in the previous panel. c, Different LLM-based strategies were used to generate pharmacist-approved medication directions from prescriber directions, highlighting their corresponding data requirements and training methodology. d, Types of evaluation and metric utilized to assess the performance of each AI approach. e, Data description and how the data were used in the study to train and evaluate the different AI approaches.
Fig. 2
Fig. 2. Prescription processing workflow and a high-level overview of MEDIC.
a, Integration of the MEDIC system within the prescription processing workflow. Flow A,B, upon a DE opening a new prescription, the suggestion module activates automatically, offering proposed directions within the DE user interface. Flow C,D, each time a DE types or edits directions, the flagging module initiates, displaying flagging results in the DE user interface. Flow E, should the entered direction be deemed accurate, it advances to pharmacist verification (PV). Flow F, detected errors in the entered direction are sent back by the pharmacists for rectification. Flow G, after verification, the typed direction moves to fulfillment. b, Workflow of the suggestion function. Incoming medication directions from the prescriber and the associated internal drug ID serve as primary inputs. Raw directions undergo processing in pharmalexical normalization, key components are identified in AI-powered extraction and finally, directions are assembled and undergo safety checks in semantic assembly and safety enforcement. c, Workflow of the flagging function. Direction pairs and their associated drug IDs are primary inputs. Both sets of directions traverse the main stages of MEDIC (pharmalexical normalization and AI-powered extraction). A component-wise comparison is then conducted between the two assembled directions to identify any discrepancies.
Fig. 3
Fig. 3. Evaluation metrics on DEval for the three AI approaches.
a, Distribution of NLP scores BLEU and METEOR for MEDIC, T5-FineTuned (1.5M) and Claude calculated across all suggested directions (n = 1,200 prescriptions). Average values are indicated with an horizontal black line and median values are highlighted with a notch on each box-plot. Whiskers extend from the first and third quartiles (box limits) toward the min/max observed values for each metric and model, respectively. b, Comparison of ratios of all categories of possible near-miss events from a total of n = 1,200 prescriptions of different models with respect to MEDIC, with their 95% percentile intervals represented by black lines obtained via bootstrap to account for the ratios’ skewed distribution, with their centers representing the median values. c, Comparison of ratios highlighting near-misses related to incorrect dosage or frequency from a total of n = 1,200 prescriptions, which carry an elevated risk of patient harm, with their 95% percentile intervals represented by black lines obtained via bootstrap to account for the ratios’ skewed distribution, with their centers representing the median values.
Fig. 4
Fig. 4. MEDIC safety guardrails triggered on human evaluation set DEval.
Safety guardrails trigger reasons and their percentage over the total number of blocked suggestions (left). Guardrails mapping from trigger reasons and the total percentage of blocked suggestions falling into the specific guardrail (right).
Fig. 5
Fig. 5. Offline flagging model performance in detecting different direction errors.
a, Error percentage distribution across all relevant components of the medication directions. b, MEDIC flagging model accuracy for each component.
Extended Data Fig. 1
Extended Data Fig. 1. Hierarchy of components in each direction.
The nine components identified by MEDIC - verb, dose, route, frequency, auxiliary actions, indications, max dose, period, and time - are represented by nodes and three examples of each component are depicted in their leaves.

References

    1. Bates DW, et al. Incidence of adverse drug events and potential adverse drug events: implications for prevention. J. Am. Med. Assoc. 1995;274:29–34. - PubMed
    1. Aspden, P., Wolcott, J., Bootman, J. L. & Cronenwett, L. R. Preventing Medication Errors (National Academies Press, 2007).
    1. Tariq, R. A., Vashisht, R., Sinha, A. & Scherbak, Y. Medication Dispensing Errors And Prevention (StatPearls Publishing, 2023). - PubMed
    1. Bates DW, Boyle DL, Vander Vliet MB, Schneider J, Leape L. Relationship between medication errors and adverse drug events. J. Gen. Intern. Med. 1995;10:199–205. - PubMed
    1. Phillips DP, Christenfeld N, Glynn LM. Increase in us medication-error deaths between 1983 and 1993. Lancet. 1998;351:643–644. - PubMed