Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

Affiliations

¹ Patient Advocacy Center, University of Miyazaki Hospital, Miyazaki, Japan.
² Division of Respirology, Rheumatology, Infectious Diseases, and Neurology, Department of Internal Medicine, University of Miyazaki, Miyazaki, Japan.
³ Health & Value, Pfizer Japan Inc., Tokyo, Japan. kanae.togo@pfizer.com.
⁴ Health & Value, Pfizer Japan Inc., Tokyo, Japan.
⁵ Oncology Medical Affairs, Pfizer Japan Inc, Tokyo, Japan.
⁶ Manufacturing IT Innovation Sector, NTT DATA Corporation, Tokyo, Japan.
⁷ Research and Development Headquarters, NTT DATA Corporation, Tokyo, Japan.

PMID: 36547809
PMCID: PMC9988800
DOI: 10.1007/s12325-022-02397-7

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

Kenji Araki et al. Adv Ther. 2023 Mar.

. 2023 Mar;40(3):934-950.

doi: 10.1007/s12325-022-02397-7. Epub 2022 Dec 22.

Authors

Affiliations

¹ Patient Advocacy Center, University of Miyazaki Hospital, Miyazaki, Japan.
² Division of Respirology, Rheumatology, Infectious Diseases, and Neurology, Department of Internal Medicine, University of Miyazaki, Miyazaki, Japan.
³ Health & Value, Pfizer Japan Inc., Tokyo, Japan. kanae.togo@pfizer.com.
⁴ Health & Value, Pfizer Japan Inc., Tokyo, Japan.
⁵ Oncology Medical Affairs, Pfizer Japan Inc, Tokyo, Japan.
⁶ Manufacturing IT Innovation Sector, NTT DATA Corporation, Tokyo, Japan.
⁷ Research and Development Headquarters, NTT DATA Corporation, Tokyo, Japan.

PMID: 36547809
PMCID: PMC9988800
DOI: 10.1007/s12325-022-02397-7

Abstract

Introduction: A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals.

Methods: We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Naïve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models.

Results: For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Naïve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan-Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data.

Conclusion: We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.

Keywords: Artificial intelligence; BERT; Electronic health records database; Lung cancer; Real-world data; Retrospective study.

Plain language summary

The use of artificial intelligence (AI) to derive health outcomes from large electronic health records is not well established. Thus, we built three different AI models: Bidirectional Encoder Representations from Transformers (BERT), Naïve Bayes, and Longformer to serve this purpose. Initially, we developed these models based on data from the University of Miyazaki Hospital (UMH) and later improved them using the Life Data Initiative (LDI) data set of six hospitals. The performance of the BERT model was better than the other two, and it showed similar results when it was applied to the LDI data set. The Kaplan–Meier plots of time to progression of disease for the predicted data by the BERT model showed similar trends to those for the manually curated data. In summary, we developed an AI model to extract health outcomes using a large electronic health database in this study; however, the performance of the AI model could be improved using more training data.

PubMed Disclaimer

Figures

**Fig. 1**
Data sources for model development. AI artificial intelligence

**Fig. 2**
Model development. AI artificial intelligence, *BERT* Bidirectional Encoder Representations from Transformers, *EHR* electronic health records, *LDI* Life Data Initiative, *UMH* University of Miyazaki Hospital

**Fig. 3**
Time to progression using treatment response estimated by the BERT model and curated manually. *TTP* time to progression, CI confidence interval

See this image and copyright information in PMC

References

1. Naidoo P, Bouharati C, Rambiritch V, et al. Real-world evidence and product development: opportunities, challenges and risk mitigation. Wien Klin Wochenschr. 2021;133(15–16):840–846. doi: 10.1007/s00508-021-01851-w. - DOI - PMC - PubMed
1. Bartlett VL, Dhruva SS, Shah ND, Ryan P, Ross JS. Feasibility of using real-world data to replicate clinical trial evidence. JAMA Netw Open. 2019;2(10):e1912869. doi: 10.1001/jamanetworkopen.2019.12869. - DOI - PMC - PubMed
1. Tayefi M, Ngo P, Chomutare T, et al. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip Rev Comput Stat. 2021;13(6):e1549. doi: 10.1002/wics.1549. - DOI
1. Mayer DA, Rasmussen LV, Roark CD, Kahn MG, Schilling LM, Wiley LK. ReviewR: A light-weight and extensible tool for manual review of clinical records. JAMIA Open. 2022;5(3):ooac071. - PMC - PubMed
1. Dalianis H. Clinical text mining: secondary use of electronic patient records. Cham: Springer Nature; 2018. 10.1007/978-3-319-78503-5.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

Affiliations

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

Authors

Affiliations

Abstract

Plain language summary

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical