This is a preprint.
Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation
- PMID: 41031083
- PMCID: PMC12478431
Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation
Update in
-
Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.J Med Internet Res. 2024 Oct 3;26:e60601. doi: 10.2196/60601. J Med Internet Res. 2024. PMID: 39361955 Free PMC article.
Abstract
Objective: This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases.
Materials and methods: We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 24 established benchmarks. Additionally, for the question-answering task, we conducted manual reviews with clinicians, focusing on Readability, Relevancy, Accuracy, and Completeness, to provide users with a more reliable evaluation.
Results: The fine-tuned models consistently improved text generation tasks. For instance, it improved the machine translation task by 20.27 in terms of BLEU score. For the answer generation task, manual reviews showed the generated answers had average scores of 4.95 (out of 5), 4.43, 3.9, and 3.31 in Readability, Relevancy, Accuracy, and Completeness, respectively.
Conclusions: This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. Ascle offers an all-in-one solution including four advanced generative functions: question-answering, text summarization, text simplification, and machine translation. The toolkit, its models, and associated data are publicly available via https://github.com/Yale-LILY/Ascle.
Keywords: generative artificial intelligence; healthcare; machine learning; natural language processing.
Conflict of interest statement
CONFLICT OF INTEREST STATEMENT The authors do not have conflicts of interest related to this study.
Figures
indicates that we have our fine-tuned models for this task.
indicates that we conducted evaluations for this task.
References
-
- Li I, Yasunaga M, Nuzumlalı MY, Caraballo C, Mahajan S, Krumholz H, et al. A Neural Topic-Attention Model for Medical Term Abbreviation Disambiguation 2019.
-
- Neural Natural Language Processing for unstructured data in electronic health records: A review. Computer Science Review 2022;46:100511.
-
- Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. n.d. URL: https://ieeexplore.ieee.org/abstract/document/8086133 (Accessed 21 October 2023).
-
- al-Aiad A, Duwairi R, Fraihat M. Survey: Deep Learning Concepts and Techniques for Electronic Health Record. n.d. URL: https://ieeexplore.ieee.org/abstract/document/8612827 (Accessed 21 October 2023).
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources