Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders
- PMID: 39492289
- DOI: 10.1016/S2589-7500(23)00202-9
Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders
Conflict of interest statement
GN reports research funding from the NIH and Renalytix; royalties from Renalytix; consultancy agreements from AstraZeneca, BioVie, GLG Consulting, Pensieve Health, Reata, Renalytix AI, Siemens, GSK Pharma, and Variant Bio; honoraria for lectures from GSK Pharma; serves in an advisory role for CRIC, Renalytix, Pensieve Health, and Neurona Health; and owns equity and stock options in Pensieve Health as a cofounder, Renalytix, and Verici. IN is part of the Board of Directors at the American College of Occupational and Environmental Medicine in a voluntary capacity. All other authors declare no competing interests. AV conceptualized the study, developed methodology, performed formal analysis and visualization. IN collected and curated data. AV, IL, and IN directly accessed and verified the underlying data in the study. AV wrote the original draft of the paper. GN and IN contributed to supervision of the study. All authors had full access to all the data in the study, provided critical feedback, approved the final draft, and had final responsibility for the decision to submit for publication. This study was approved by the institutional review board at Icahn School of Medicine at Mount Sinai. STUDY-19-00607: Utilizing Natural Language Processing (NLP) and Machine Learning (ML) to Predict Acuity and Return to Work Timeline for Patients with Lower Back, Knee, and Shoulder Pain. The study was exempt from the requirement of individual patient consent due to use of retrospective patient data. This work is supported by grant UL1TR001433 from the US National Center for Advancing Translational Sciences, National Institutes of Health, and T42OH008422 from the Pilot Projects Research Training Program of the New York and New Jersey Education and Research Center, National Institute for Occupational Safety and Health. The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the work. The study uses identified patient notes for training and testing models, which cannot be released given concerns of patient privacy. Foundational LLaMA language models are publicly available for non-commercial use directly from Meta AI research. The instructional Alpaca dataset is also publicly available with a non-restrictive license. Code for this work is available at https://github.com/akhilvaid/MusculoskeletalPainLLaMA under a GPLv3 license.
LinkOut - more resources
Full Text Sources