Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar;40(3):934-950.
doi: 10.1007/s12325-022-02397-7. Epub 2022 Dec 22.

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

Affiliations

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

Kenji Araki et al. Adv Ther. 2023 Mar.

Abstract

Introduction: A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals.

Methods: We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Naïve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models.

Results: For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Naïve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan-Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data.

Conclusion: We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.

Keywords: Artificial intelligence; BERT; Electronic health records database; Lung cancer; Real-world data; Retrospective study.

Plain language summary

The use of artificial intelligence (AI) to derive health outcomes from large electronic health records is not well established. Thus, we built three different AI models: Bidirectional Encoder Representations from Transformers (BERT), Naïve Bayes, and Longformer to serve this purpose. Initially, we developed these models based on data from the University of Miyazaki Hospital (UMH) and later improved them using the Life Data Initiative (LDI) data set of six hospitals. The performance of the BERT model was better than the other two, and it showed similar results when it was applied to the LDI data set. The Kaplan–Meier plots of time to progression of disease for the predicted data by the BERT model showed similar trends to those for the manually curated data. In summary, we developed an AI model to extract health outcomes using a large electronic health database in this study; however, the performance of the AI model could be improved using more training data.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Data sources for model development. AI artificial intelligence
Fig. 2
Fig. 2
Model development. AI artificial intelligence, BERT Bidirectional Encoder Representations from Transformers, EHR electronic health records, LDI Life Data Initiative, UMH University of Miyazaki Hospital
Fig. 3
Fig. 3
Time to progression using treatment response estimated by the BERT model and curated manually. TTP time to progression, CI confidence interval

References

    1. Naidoo P, Bouharati C, Rambiritch V, et al. Real-world evidence and product development: opportunities, challenges and risk mitigation. Wien Klin Wochenschr. 2021;133(15–16):840–846. doi: 10.1007/s00508-021-01851-w. - DOI - PMC - PubMed
    1. Bartlett VL, Dhruva SS, Shah ND, Ryan P, Ross JS. Feasibility of using real-world data to replicate clinical trial evidence. JAMA Netw Open. 2019;2(10):e1912869. doi: 10.1001/jamanetworkopen.2019.12869. - DOI - PMC - PubMed
    1. Tayefi M, Ngo P, Chomutare T, et al. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip Rev Comput Stat. 2021;13(6):e1549. doi: 10.1002/wics.1549. - DOI
    1. Mayer DA, Rasmussen LV, Roark CD, Kahn MG, Schilling LM, Wiley LK. ReviewR: A light-weight and extensible tool for manual review of clinical records. JAMIA Open. 2022;5(3):ooac071. - PMC - PubMed
    1. Dalianis H. Clinical text mining: secondary use of electronic patient records. Cham: Springer Nature; 2018. 10.1007/978-3-319-78503-5.

Publication types