A large dataset of annotated incident reports on medication errors
- PMID: 38424103
- PMCID: PMC10904777
- DOI: 10.1038/s41597-024-03036-2
A large dataset of annotated incident reports on medication errors
Abstract
Incident reports of medication errors are valuable learning resources for improving patient safety. However, pertinent information is often contained within unstructured free text, which prevents automated analysis and limits the usefulness of these data. Natural language processing can structure this free text automatically and retrieve relevant past incidents and learning materials, but to be able to do so requires a large, fully annotated and validated corpus of incident reports. We present a corpus of 58,658 machine-annotated incident reports of medication errors that can be used to advance the development of information extraction models and subsequent incident learning. We report the best F1-scores for the annotated dataset: 0.97 and 0.76 for named entity recognition and intention/factuality analysis, respectively, for the cross-validation exercise. Our dataset contains 478,175 named entities and differentiates between incident types by recognising discrepancies between what was intended and what actually occurred. We explain our annotation workflow and technical validation and provide access to the validation datasets and machine annotator for labelling future incident reports of medication errors.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures







Similar articles
-
Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12. J Biomed Inform. 2020. PMID: 31838210
-
Rule-Based Natural Language Processing Pipeline to Detect Medication-Related Named Entities: Insights for Transfer Learning.Stud Health Technol Inform. 2024 Jan 25;310:584-588. doi: 10.3233/SHTI231032. Stud Health Technol Inform. 2024. PMID: 38269876
-
A Five-Step Workflow to Manually Annotate Unstructured Data into Training Dataset for Natural Language Processing.Stud Health Technol Inform. 2024 Jan 25;310:109-113. doi: 10.3233/SHTI230937. Stud Health Technol Inform. 2024. PMID: 38269775
-
A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis.Int J Med Inform. 2019 Dec;132:103971. doi: 10.1016/j.ijmedinf.2019.103971. Epub 2019 Oct 5. Int J Med Inform. 2019. PMID: 31630063
-
Evaluation of a prototype machine learning tool to semi-automate data extraction for systematic literature reviews.Syst Rev. 2023 Oct 6;12(1):187. doi: 10.1186/s13643-023-02351-w. Syst Rev. 2023. PMID: 37803451 Free PMC article.
Cited by
-
A scoping review on generative AI and large language models in mitigating medication related harm.NPJ Digit Med. 2025 Mar 28;8(1):182. doi: 10.1038/s41746-025-01565-7. NPJ Digit Med. 2025. PMID: 40155703 Free PMC article.
-
A pathway from fragmentation to interoperability through standards-based enterprise architecture to enhance patient safety.NPJ Digit Med. 2025 Jan 18;8(1):41. doi: 10.1038/s41746-025-01442-3. NPJ Digit Med. 2025. PMID: 39827262 Free PMC article. Review.
-
A scoping review of natural language processing in addressing medically inaccurate information: Errors, misinformation, and hallucination.J Biomed Inform. 2025 Jul 22:104866. doi: 10.1016/j.jbi.2025.104866. Online ahead of print. J Biomed Inform. 2025. PMID: 40706945 Review.
-
Electronic Prescribing in the Neonatal Intensive Care Unit: Analysis of Prescribing Errors and Risk Factors.J Med Syst. 2025 Feb 18;49(1):26. doi: 10.1007/s10916-025-02161-8. J Med Syst. 2025. PMID: 39964641
References
-
- Patient safety incident reporting and learning systems: technical report and guidance. (World Health Organization, 2020).
-
- Global patient safety action plan 2021–2030: towards eliminating avoidable harm in health care. (World Health Organization, 2022).
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous