Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec:110:84-91.
doi: 10.1016/j.urology.2017.07.056. Epub 2017 Sep 12.

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research

Affiliations

Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research

Florian R Schroeck et al. Urology. 2017 Dec.

Abstract

Objective: To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports.

Methods: Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer.

Results: When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer.

Conclusion: NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: none

Figures

Figure 1
Figure 1
Documents classified correctly as well as false positive and false negatives among the 150 bladder cancer pathology reports included in the validation sample. Black bars indicate correctly identified reports, grey bars are NLP false negatives, and white bars are NLP false positives. Black and grey bars together represent the count based on the gold standard annotation.

Comment in

  • Editorial Comment.
    Zeineh J, Donovan MJ. Zeineh J, et al. Urology. 2017 Dec;110:90-91. doi: 10.1016/j.urology.2017.07.057. Epub 2017 Oct 16. Urology. 2017. PMID: 29050642 No abstract available.

Similar articles

Cited by

References

    1. Howlader N, Noone AM, Krapcho M, et al. Natl Cancer Inst. Bethesda MD: 2014. [accessed August 30, 2014]. SEER Cancer Statistics Review, 1975-2011. Available at: http://seer.cancer.gov/csr/1975_2011/
    1. Burger M, Catto JWF, Dalbagni G, et al. Epidemiology and Risk Factors of Urothelial Bladder Cancer. Eur Urol. 2013;63:234–241. - PubMed
    1. Cambier S, Sylvester RJ, Collette L, et al. EORTC Nomograms and Risk Groups for Predicting Recurrence, Progression, and Disease-specific and Overall Survival in Non–Muscle-invasive Stage Ta–T1 Urothelial Bladder Cancer Patients Treated with 1–3 Years of Maintenance Bacillus Calmette-Guérin. Eur Urol. 2016;69:60–69. - PubMed
    1. Ries LAG, Young JL, Keel GE, et al. SEER Survival Monograph: Cancer Survival Among Adults: US SEER Program, 1988-2001, Patient and Tumor Characteristics Pub No 07-6215. Bethesda, MD: NIH; 2007. [accessed August 12, 2014]. Available at: http://seer.cancer.gov/archive/publications/survival/seer_survival_mono_....
    1. Holmäng S. Follow-up of patients with noninvasive and superficially invasive bladder cancer. Semin Urol Oncol. 2000;18:273–279. - PubMed

Publication types

MeSH terms