Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients
- PMID: 31014980
- PMCID: PMC6584041
- DOI: 10.1016/j.jbi.2019.103184
Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients
Abstract
Objective: Clinical care guidelines recommend that newly diagnosed prostate cancer patients at high risk for metastatic spread receive a bone scan prior to treatment and that low risk patients not receive it. The objective was to develop an automated pipeline to interrogate heterogeneous data to evaluate the use of bone scans using a two different Natural Language Processing (NLP) approaches.
Materials and methods: Our cohort was divided into risk groups based on Electronic Health Records (EHR). Information on bone scan utilization was identified in both structured data and free text from clinical notes. Our pipeline annotated sentences with a combination of a rule-based method using the ConText algorithm (a generalization of NegEx) and a Convolutional Neural Network (CNN) method using word2vec to produce word embeddings.
Results: A total of 5500 patients and 369,764 notes were included in the study. A total of 39% of patients were high-risk and 73% of these received a bone scan; of the 18% low risk patients, 10% received one. The accuracy of CNN model outperformed the rule-based model one (F-measure = 0.918 and 0.897 respectively). We demonstrate a combination of both models could maximize precision or recall, based on the study question.
Conclusion: Using structured data, we accurately classified patients' cancer risk group, identified bone scan documentation with two NLP methods, and evaluated guideline adherence. Our pipeline can be used to provide concrete feedback to clinicians and guide treatment decisions.
Keywords: Electronic health records; Machine learning; Natural language processing; Prostate cancer.
Copyright © 2019 Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
Figures




Similar articles
-
Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4. BMC Med Res Methodol. 2024. PMID: 38760718 Free PMC article.
-
Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network.BMC Med Inform Decis Mak. 2023 Oct 13;23(1):216. doi: 10.1186/s12911-023-02301-5. BMC Med Inform Decis Mak. 2023. PMID: 37833661 Free PMC article.
-
Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases.J Biomed Inform. 2021 Aug;120:103864. doi: 10.1016/j.jbi.2021.103864. Epub 2021 Jul 12. J Biomed Inform. 2021. PMID: 34265451
-
Application of Natural Language Processing in Electronic Health Record Data Extraction for Navigating Prostate Cancer Care: A Narrative Review.J Endourol. 2024 Aug;38(8):852-864. doi: 10.1089/end.2023.0690. Epub 2024 May 13. J Endourol. 2024. PMID: 38613805 Review.
-
Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173. J Am Med Inform Assoc. 2019. PMID: 30726935 Free PMC article.
Cited by
-
Clinical applications of large language models in medicine and surgery: A scoping review.J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4. J Int Med Res. 2025. PMID: 40615349 Free PMC article.
-
Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review.JMIR Med Inform. 2023 Dec 15;11:e42477. doi: 10.2196/42477. JMIR Med Inform. 2023. PMID: 38100200 Free PMC article. Review.
-
Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.JCO Clin Cancer Inform. 2022 Jul;6:e2200006. doi: 10.1200/CCI.22.00006. JCO Clin Cancer Inform. 2022. PMID: 35917480 Free PMC article.
-
The Coming of Age of AI/ML in Drug Discovery, Development, Clinical Testing, and Manufacturing: The FDA Perspectives.Drug Des Devel Ther. 2023 Sep 6;17:2691-2725. doi: 10.2147/DDDT.S424991. eCollection 2023. Drug Des Devel Ther. 2023. PMID: 37701048 Free PMC article.
-
Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case.Learn Health Syst. 2020 Jul 17;4(4):e10237. doi: 10.1002/lrh2.10237. eCollection 2020 Oct. Learn Health Syst. 2020. PMID: 33083539 Free PMC article.
References
-
- Center MM, Jemal A, Lortet-Tieulent J, et al. International Variation in Prostate Cancer Incidence and Mortality Rates. Eur Urol 2012;61:1079–92. - PubMed
-
- Dall’Era MA, Albertsen PC, Bangma C, et al. Active Surveillance for Prostate Cancer: A Systematic Review of the Literature. Eur Urol 2012;62:976–83. - PubMed
-
- D’Amico AV, Whittington R, Malkowicz SB, et al. Biochemical Outcome After Radical Prostatectomy, External Beam Radiation Therapy, or Interstitial Radiation Therapy for Clinically Localized Prostate Cancer. JAMA 1998;280:969–74. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical