Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Dec 9:arXiv:2311.16588v2.

Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

Affiliations

Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation

Rui Yang et al. ArXiv. .

Update in

Abstract

Objective: This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases.

Materials and methods: We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 24 established benchmarks. Additionally, for the question-answering task, we conducted manual reviews with clinicians, focusing on Readability, Relevancy, Accuracy, and Completeness, to provide users with a more reliable evaluation.

Results: The fine-tuned models consistently improved text generation tasks. For instance, it improved the machine translation task by 20.27 in terms of BLEU score. For the answer generation task, manual reviews showed the generated answers had average scores of 4.95 (out of 5), 4.43, 3.9, and 3.31 in Readability, Relevancy, Accuracy, and Completeness, respectively.

Conclusions: This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. Ascle offers an all-in-one solution including four advanced generative functions: question-answering, text summarization, text simplification, and machine translation. The toolkit, its models, and associated data are publicly available via https://github.com/Yale-LILY/Ascle.

Keywords: generative artificial intelligence; healthcare; machine learning; natural language processing.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST STATEMENT The authors do not have conflicts of interest related to this study.

Figures

Figure 1.
Figure 1.
The overall architecture of Ascle. formula image indicates that we have our fine-tuned models for this task. formula image indicates that we conducted evaluations for this task.
Figure 2.
Figure 2.
Evaluation for multiple-choice question-answering task.
Figure 3.
Figure 3.
(A) Evaluation for text simplification task using ROUGE scores. (B) Evaluation for text simplification task using FKGL score.
Figure 4.
Figure 4.
Evaluation for machine translation task.
Figure 5.
Figure 5.
(A) Manual evaluation (Readability, Relevancy, Accuracy, Completeness) for 50 question-answer pairs. (B) Two examples of generated answers with ground truth.
Figure 6.
Figure 6.
Demonstration of system usage. We show two cases: text simplification and machine translation.

References

    1. Li I, Yasunaga M, Nuzumlalı MY, Caraballo C, Mahajan S, Krumholz H, et al. A Neural Topic-Attention Model for Medical Term Abbreviation Disambiguation 2019.
    1. Neural Natural Language Processing for unstructured data in electronic health records: A review. Computer Science Review 2022;46:100511.
    1. International Society for Biocuration. Biocuration: Distilling data into knowledge. PLoS Biol 2018;16:e2002846. - PMC - PubMed
    1. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. n.d. URL: https://ieeexplore.ieee.org/abstract/document/8086133 (Accessed 21 October 2023).
    1. al-Aiad A, Duwairi R, Fraihat M. Survey: Deep Learning Concepts and Techniques for Electronic Health Record. n.d. URL: https://ieeexplore.ieee.org/abstract/document/8612827 (Accessed 21 October 2023).

Publication types

LinkOut - more resources