. 2024 Sep 24:12:e58977.

doi: 10.2196/58977.

Automated System to Capture Patient Symptoms From Multitype Japanese Clinical Texts: Retrospective Study

Affiliations

¹ Department of Information Science, Nara Institute of Science and Technology, Ikoma, Japan.
² Graduate School of Medicine, Kyoto University, Kyoto, Japan.
³ Center for Information and Neural Networks, Advanced ICT Research Institute, Osaka, Japan.
⁴ Department of Breast Surgery, Kansai Medical University, Hirakata, Japan.
⁵ Tokyo Metropolitan Cancer and Infectious Disease Center, Komagome Hospital, Tokyo, Japan.

^# Contributed equally.

PMID: 39316418
PMCID: PMC11462096
DOI: 10.2196/58977

Automated System to Capture Patient Symptoms From Multitype Japanese Clinical Texts: Retrospective Study

Tomohiro Nishiyama et al. JMIR Med Inform. 2024.

. 2024 Sep 24:12:e58977.

doi: 10.2196/58977.

Affiliations

¹ Department of Information Science, Nara Institute of Science and Technology, Ikoma, Japan.
² Graduate School of Medicine, Kyoto University, Kyoto, Japan.
³ Center for Information and Neural Networks, Advanced ICT Research Institute, Osaka, Japan.
⁴ Department of Breast Surgery, Kansai Medical University, Hirakata, Japan.
⁵ Tokyo Metropolitan Cancer and Infectious Disease Center, Komagome Hospital, Tokyo, Japan.

^# Contributed equally.

PMID: 39316418
PMCID: PMC11462096
DOI: 10.2196/58977

Abstract

Background: Natural language processing (NLP) techniques can be used to analyze large amounts of electronic health record texts, which encompasses various types of patient information such as quality of life, effectiveness of treatments, and adverse drug event (ADE) signals. As different aspects of a patient's status are stored in different types of documents, we propose an NLP system capable of processing 6 types of documents: physician progress notes, discharge summaries, radiology reports, radioisotope reports, nursing records, and pharmacist progress notes.

Objective: This study aimed to investigate the system's performance in detecting ADEs by evaluating the results from multitype texts. The main objective is to detect adverse events accurately using an NLP system.

Methods: We used data written in Japanese from 2289 patients with breast cancer, including medication data, physician progress notes, discharge summaries, radiology reports, radioisotope reports, nursing records, and pharmacist progress notes. Our system performs 3 processes: named entity recognition, normalization of symptoms, and aggregation of multiple types of documents from multiple patients. Among all patients with breast cancer, 103 and 112 with peripheral neuropathy (PN) received paclitaxel or docetaxel, respectively. We evaluate the utility of using multiple types of documents by correlation coefficient and regression analysis to compare their performance with each single type of document. All evaluations of detection rates with our system are performed 30 days after drug administration.

Results: Our system underestimates by 13.3 percentage points (74.0%-60.7%), as the incidence of paclitaxel-induced PN was 60.7%, compared with 74.0% in the previous research based on manual extraction. The Pearson correlation coefficient between the manual extraction and system results was 0.87 Although the pharmacist progress notes had the highest detection rate among each type of document, the rate did not match the performance using all documents. The estimated median duration of PN with paclitaxel was 92 days, whereas the previously reported median duration of PN with paclitaxel was 727 days. The number of events detected in each document was highest in the physician's progress notes, followed by the pharmacist's and nursing records.

Conclusions: Considering the inherent cost that requires constant monitoring of the patient's condition, such as the treatment of PN, our system has a significant advantage in that it can immediately estimate the treatment duration without fine-tuning a new NLP model. Leveraging multitype documents is better than using single-type documents to improve detection performance. Although the onset time estimation was relatively accurate, the duration might have been influenced by the length of the data follow-up period. The results suggest that our method using various types of data can detect more ADEs from clinical documents.

Keywords: EHR; EHRs; ML; NLP; adverse; adverse drug reaction; adverse event; cancer; detect; detecting; detection; drug; drugs; machine learning; medication; medications; named entity recognition; natural language processing; neuropathy; note; notes; oncology; peripheral neuropathy; pharmaceutic; pharmaceutical; pharmaceuticals; pharmaceutics; pharmacology; pharmacotherapy; record; records; report; reports; symptom; symptoms; text; texts; textual.

©Tomohiro Nishiyama, Ayane Yamaguchi, Peitao Han, Lis Weiji Kanashiro Pereira, Yuka Otsuki, Gabriel Herman Bernardim Andrade, Noriko Kudo, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki, Masahiro Takada, Masakazu Toi. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 24.09.2024.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: MToi has received research grants from Chugai, Takeda, Pfizer, Taiho, Japan Breast Cancer Research Group Association (JBCRG), Kyoto Breast Cancer Research Network (KBCRN), Eisai, Eli-Lilly and companies, Daiichi-Sankyo, AstraZeneca, Astellas, Shimadzu, Yakult, Nippon Kayaku, AFI technology, Luxonus, Shionogi, GL Science, Sanwa Shurui; and lecture fees from Chugai, Takeda, Pfizer, Kyowa-Kirin, Taiho, Eisai, Daiichi-Sankyo, AstraZeneca, Eli Lilly and companies, MSD, Exact Science, Novartis, Shimadzu, Yakult, Nippon Kayaku、Devicore Medical Japan, Sysmex; and advisory fees from Daiichi-Sankyo, Eli Lilly and companies, BMS, Bertis, Terumo, Kansai Medical Net.

Figures

**Figure 1**
Data flow of the proposed system. (A) shows the events from multiple types of documents are extracted. An event timeline (B) is created from each clinical data using the natural language processing method, and then the curve (C) is created based on the aggregated results. The dots in the event timeline indicate the timing at which the description of drug administration or symptom onset is recorded. Based on (B), patients who received the target drug (a taxane drug in this study) are selected, and the Kaplan-Meier curve (C) is generated.

**Figure 2**
Flowchart describing the procedure for selecting patient data according to criteria.

**Figure 3**
Workflow of our natural language processing system, which is composed of named entity recognition, normalization, and aggregation. Text X and Text Y are examples of 2 types of documents respectively, for example, physician progress notes and pharmacist progress notes.

**Figure 4**
Event timeline from multiple types of data and calculation of the number of days of peripheral neuropathy onset and duration.

**Figure 5**
Kaplan-Meier curves of the results obtained by our system (Paclitaxel_NLP and Docetaxel_NLP) and the previous results obtained using a manual method (Paclitaxel_MAN). The solid line indicates the proportion of patients who developed peripheral neuropathy among those who received paclitaxel or docetaxel. Filled areas indicate 95% CIs.

**Figure 6**
Comparison between the results from each document type and all document types.

**Figure 7**
Rates of patients with peripheral neuropathy detected in each document compared with manual results.

See this image and copyright information in PMC

Cited by

Large Language Models for Adverse Drug Events: A Clinical Perspective.
Zitu MM, Owen D, Manne A, Wei P, Li L. Zitu MM, et al. J Clin Med. 2025 Aug 4;14(15):5490. doi: 10.3390/jcm14155490. J Clin Med. 2025. PMID: 40807108 Free PMC article. Review.
Identifying Adverse Events in Outpatients With Prostate Cancer Using Pharmaceutical Care Records in Community Pharmacies: Application of Named Entity Recognition.
Yanagisawa Y, Watabe S, Yokoyama S, Sayama K, Kizaki H, Tsuchiya M, Imai S, Someya M, Taniguchi R, Yada S, Aramaki E, Hori S. Yanagisawa Y, et al. JMIR Cancer. 2025 Mar 11;11:e69663. doi: 10.2196/69663. JMIR Cancer. 2025. PMID: 40068144 Free PMC article.

References

1. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); June 2-7, 2019; Minneapolis, Minnesota. Association for Computational Linguistics; 2019. pp. 4171–4186. https://aclanthology.org/N19-1423/ - DOI
1. Huang SC, Pareek A, Seyyedi S, Banerjee I, Lungren MP. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit Med. 2020;3:136. doi: 10.1038/s41746-020-00341-z. doi: 10.1038/s41746-020-00341-z.341 - DOI - DOI - PMC - PubMed
1. Morin O, Vallières M, Braunstein S, Ginart JB, Upadhaya T, Woodruff HC, Zwanenburg A, Chatterjee A, Villanueva-Meyer JE, Valdes G, Chen W, Hong JC, Yom SS, Solberg TD, Löck S, Seuntjens J, Park C, Lambin P. An artificial intelligence framework integrating longitudinal electronic health records with real-world data enables continuous pan-cancer prognostication. Nat Cancer. 2021;2(7):709–722. doi: 10.1038/s43018-021-00236-2. - DOI - PubMed
1. Huang Kexin, Altosaar Jaan, Ranganath Rajesh. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv. doi: 10.48550/ARXIV.1904.05342. Preprint posted online April 10, 2019. - DOI
1. Lee J, Yoon W, Kim S, Kim D, Kim S, So C, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–1240. doi: 10.1093/bioinformatics/btz682. https://europepmc.org/abstract/MED/31501885 5566506 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated System to Capture Patient Symptoms From Multitype Japanese Clinical Texts: Retrospective Study

Affiliations

Automated System to Capture Patient Symptoms From Multitype Japanese Clinical Texts: Retrospective Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Research Materials