. 2023 Aug 22:22:32-40.

doi: 10.1016/j.csbj.2023.08.018. eCollection 2023.

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records

Sicheng Zhou¹, Nan Wang², Liwei Wang³, Ju Sun⁴, Anne Blaes⁵, Hongfang Liu³, Rui Zhang⁶

Affiliations

¹ Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.
² School of Statistics, University of Minnesota, Minneapolis, MN, USA.
³ Department of AI and Informatics, Mayo Clinic, Rochester, MN, USA.
⁴ Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA.
⁵ Department of Medicine, University of Minnesota, Minneapolis, MN, USA.
⁶ Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA.

PMID: 37680211
PMCID: PMC10480628
DOI: 10.1016/j.csbj.2023.08.018

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records

Sicheng Zhou et al. Comput Struct Biotechnol J. 2023.

. 2023 Aug 22:22:32-40.

doi: 10.1016/j.csbj.2023.08.018. eCollection 2023.

Authors

Sicheng Zhou¹, Nan Wang², Liwei Wang³, Ju Sun⁴, Anne Blaes⁵, Hongfang Liu³, Rui Zhang⁶

Affiliations

¹ Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.
² School of Statistics, University of Minnesota, Minneapolis, MN, USA.
³ Department of AI and Informatics, Mayo Clinic, Rochester, MN, USA.
⁴ Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA.
⁵ Department of Medicine, University of Minnesota, Minneapolis, MN, USA.
⁶ Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA.

PMID: 37680211
PMCID: PMC10480628
DOI: 10.1016/j.csbj.2023.08.018

Abstract

Objective: Transformer-based language models are prevailing in the clinical domain due to their excellent performance on clinical NLP tasks. The generalizability of those models is usually ignored during the model development process. This study evaluated the generalizability of CancerBERT, a Transformer-based clinical NLP model, along with classic machine learning models, i.e., conditional random field (CRF), bi-directional long short-term memory CRF (BiLSTM-CRF), across different clinical institutes through a breast cancer phenotype extraction task.

Materials and methods: Two clinical corpora of breast cancer patients were collected from the electronic health records from the University of Minnesota (UMN) and Mayo Clinic (MC), and annotated following the same guideline. We developed three types of NLP models (i.e., CRF, BiLSTM-CRF and CancerBERT) to extract cancer phenotypes from clinical texts. We evaluated the generalizability of models on different test sets with different learning strategies (model transfer vs locally trained). The entity coverage score was assessed with their association with the model performances.

Results: We manually annotated 200 and 161 clinical documents at UMN and MC. The corpora of the two institutes were found to have higher similarity between the target entities than the overall corpora. The CancerBERT models obtained the best performances among the independent test sets from two clinical institutes and the permutation test set. The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance compared to the model developed on local data (micro-F1: 0.925 vs 0.932).

Conclusions: The results indicate the CancerBERT model has superior learning ability and generalizability among the three types of clinical NLP models for our named entity recognition task. It has the advantage to recognize complex entities, e.g., entities with different labels.

Keywords: Electronic health records; Generalizability; Information extraction; Natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1**
The pipeline of the study. Data were collected and annotated from the UMN and MC. The UMN models were externally evaluated on MC data. UMN models were further refined on MC data and evaluated as comparisons. Permutation dataset evaluation and Entity coverage ratio analysis were conducted to explore the model generalizability.

**Fig. 2**
The performances (strict F1 scores) of CRF_UMN, BiLSTM-CRF_UMN, and CancerBERT_{UMN_397} models on different test sets. The original test set is the UMN test set, and the portability test set is MC test set. All models were UMN models trained solely on UMN data.

**Fig. 3**
The performances (strict F1 scores) of CRF_UMN, BiLSTM-CRF_UMN, and CancerBERT_{UMN_397} models for the identification of entities in different ECR groups. Group 1: 0 < = ECR < 0.33, Group 2: 0.33 < = ECR < 0.67, Group 3: 0.67 < = ECR< 1, and Group 4: ECR = 1.

See this image and copyright information in PMC

Cited by

A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction.
Fu S, Wang L, He H, Wen A, Zong N, Kumari A, Liu F, Zhou S, Zhang R, Li C, Wang Y, St Sauver J, Liu H, Sohn S. Fu S, et al. J Am Med Inform Assoc. 2024 Jun 20;31(7):1493-1502. doi: 10.1093/jamia/ocae101. J Am Med Inform Assoc. 2024. PMID: 38742455 Free PMC article.
Deep Learning Model for Natural Language to Assess Effectiveness of Patients With Non-Muscle Invasive Bladder Cancer Receiving Intravesical Bacillus Calmette-Guérin Therapy.
Miyake M, Yonemoto N, Togo K, Xu L, Oguri T, Tanaka M, Hasegawa Y, Izawa Y, Araki K. Miyake M, et al. JCO Clin Cancer Inform. 2025 Jun;9:e2400249. doi: 10.1200/CCI-24-00249. Epub 2025 Jun 27. JCO Clin Cancer Inform. 2025. PMID: 40577661 Free PMC article.
Clinical applications of large language models in medicine and surgery: A scoping review.
Liang EN, Pei S, Staibano P, van der Woerd B. Liang EN, et al. J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4. J Int Med Res. 2025. PMID: 40615349 Free PMC article.
A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports.
Yang X, Li H, Xu Y, Shen N, He R. Yang X, et al. Sci Rep. 2025 Jul 30;15(1):27760. doi: 10.1038/s41598-025-11622-6. Sci Rep. 2025. PMID: 40739157 Free PMC article.
Multimodal deep learning for predicting neoadjuvant treatment outcomes in breast cancer: a systematic review.
Krasniqi E, Filomeno L, Arcuri T, Ferretti G, Gasparro S, Fulvi A, Roselli A, D'Onofrio L, Pizzuti L, Barba M, Maugeri-Saccà M, Botti C, Graziano F, Puccica I, Cappelli S, Pelle F, Cavicchi F, Villanucci A, Paris I, Calabrò F, Rea S, Costantini M, Perracchio L, Sanguineti G, Takanen S, Marucci L, Greco L, Kayal R, Moscetti L, Marchesini E, Calonaci N, Blandino G, Caravagna G, Vici P. Krasniqi E, et al. Biol Direct. 2025 Jun 23;20(1):72. doi: 10.1186/s13062-025-00661-8. Biol Direct. 2025. PMID: 40551237 Free PMC article.

See all "Cited by" articles

References

1. Sohn S., Wang Y., Wi C.I., Krusemark E.A., Ryu E., Ali M.H., Juhn Y.J., Liu H. Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions. J Am Med Inform Assoc. 2018;25(3):353–359. (Mar) - PMC - PubMed
1. Xie F., Lee J., Munoz-Plaza C.E., Hahn E.E., Chen W. Application of text information extraction system for real-time cancer case identification in an integrated healthcare organization. J Pathol Inform. 2017;8(1):48. Jan 1. - PMC - PubMed
1. Carchiolo V., Longheu A., Reitano G., Zagarella L. 2019 Federated Conference on Computer Science and Information Systems (FedCSIS) IEEE,; 2019. Medical prescription classification: a NLP-based approach; pp. 605–609. Sep 1. Sep 1.
1. Vijayakrishnan R., Steinhubl S.R., Ng K., Sun J., Byrd R.J., Daar Z., Williams B.A., Defilippi C., Ebadollahi S., Stewart W.F. Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. J Card Fail. 2014;20(7):459–464. Jul 1. - PMC - PubMed
1. Mavrogiorgos K., Mavrogiorgou A., Kiourtis A., Zafeiropoulos N., Kleftakis S., Kyriazis D. 2022 32nd Conference of Open Innovations Association (FRUCT) IEEE,; 2022. Automated rule-based data cleaning using NLP; pp. 162–168. Nov 9. Nov 9.

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records

Affiliations

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources