Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research
- PMID: 28916254
- PMCID: PMC5696035
- DOI: 10.1016/j.urology.2017.07.056
Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research
Abstract
Objective: To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports.
Methods: Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer.
Results: When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer.
Conclusion: NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data.
Published by Elsevier Inc.
Conflict of interest statement
Figures

Comment in
-
Editorial Comment.Urology. 2017 Dec;110:90-91. doi: 10.1016/j.urology.2017.07.057. Epub 2017 Oct 16. Urology. 2017. PMID: 29050642 No abstract available.
Similar articles
-
Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.JCO Clin Cancer Inform. 2018 Dec;2:1-8. doi: 10.1200/CCI.17.00128. JCO Clin Cancer Inform. 2018. PMID: 30652586 Free PMC article.
-
Context-Based Identification of Muscle Invasion Status in Patients With Bladder Cancer Using Natural Language Processing.JCO Clin Cancer Inform. 2022 Jan;6:e2100097. doi: 10.1200/CCI.21.00097. JCO Clin Cancer Inform. 2022. PMID: 35073149
-
A natural language processing program effectively extracts key pathologic findings from radical prostatectomy reports.J Endourol. 2014 Dec;28(12):1474-8. doi: 10.1089/end.2014.0221. J Endourol. 2014. PMID: 25211697
-
Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing.J Digit Imaging. 2010 Apr;23(2):119-32. doi: 10.1007/s10278-009-9215-7. Epub 2009 May 30. J Digit Imaging. 2010. PMID: 19484309 Free PMC article. Review.
-
Applications of natural language processing at emergency department triage: A narrative review.PLoS One. 2023 Dec 14;18(12):e0279953. doi: 10.1371/journal.pone.0279953. eCollection 2023. PLoS One. 2023. PMID: 38096321 Free PMC article. Review.
Cited by
-
Non-Muscle Invasive Bladder Cancer: Many More Patients Die With It Than Of It.Bladder Cancer. 2024 Jun 18;10(2):113-117. doi: 10.3233/BLC-230099. eCollection 2024. Bladder Cancer. 2024. PMID: 39131873 Free PMC article.
-
Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study.JMIR Med Inform. 2020 Nov 3;8(11):e20826. doi: 10.2196/20826. JMIR Med Inform. 2020. PMID: 32469840 Free PMC article.
-
Conceptual Framework to Support Clinical Trial Optimization and End-to-End Enrollment Workflow.JCO Clin Cancer Inform. 2019 Jun;3:1-10. doi: 10.1200/CCI.19.00033. JCO Clin Cancer Inform. 2019. PMID: 31225983 Free PMC article.
-
Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation.JAMIA Open. 2020 Oct 14;3(3):431-438. doi: 10.1093/jamiaopen/ooaa029. eCollection 2020 Oct. JAMIA Open. 2020. PMID: 33381748 Free PMC article.
-
Extent of Risk-Aligned Surveillance for Cancer Recurrence Among Patients With Early-Stage Bladder Cancer.JAMA Netw Open. 2018 Sep;1(5):e183442. doi: 10.1001/jamanetworkopen.2018.3442. Epub 2018 Sep 28. JAMA Netw Open. 2018. PMID: 30465041 Free PMC article.
References
-
- Howlader N, Noone AM, Krapcho M, et al. Natl Cancer Inst. Bethesda MD: 2014. [accessed August 30, 2014]. SEER Cancer Statistics Review, 1975-2011. Available at: http://seer.cancer.gov/csr/1975_2011/
-
- Burger M, Catto JWF, Dalbagni G, et al. Epidemiology and Risk Factors of Urothelial Bladder Cancer. Eur Urol. 2013;63:234–241. - PubMed
-
- Cambier S, Sylvester RJ, Collette L, et al. EORTC Nomograms and Risk Groups for Predicting Recurrence, Progression, and Disease-specific and Overall Survival in Non–Muscle-invasive Stage Ta–T1 Urothelial Bladder Cancer Patients Treated with 1–3 Years of Maintenance Bacillus Calmette-Guérin. Eur Urol. 2016;69:60–69. - PubMed
-
- Ries LAG, Young JL, Keel GE, et al. SEER Survival Monograph: Cancer Survival Among Adults: US SEER Program, 1988-2001, Patient and Tumor Characteristics Pub No 07-6215. Bethesda, MD: NIH; 2007. [accessed August 12, 2014]. Available at: http://seer.cancer.gov/archive/publications/survival/seer_survival_mono_....
-
- Holmäng S. Follow-up of patients with noninvasive and superficially invasive bladder cancer. Semin Urol Oncol. 2000;18:273–279. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical