Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep:5:1005-1014.
doi: 10.1200/CCI.21.00030.

Ascertainment of Veterans With Metastatic Prostate Cancer in Electronic Health Records: Demonstrating the Case for Natural Language Processing

Affiliations

Ascertainment of Veterans With Metastatic Prostate Cancer in Electronic Health Records: Demonstrating the Case for Natural Language Processing

Patrick R Alba et al. JCO Clin Cancer Inform. 2021 Sep.

Abstract

Purpose: Prostate cancer (PCa) is among the leading causes of cancer deaths. While localized PCa has a 5-year survival rate approaching 100%, this rate drops to 31% for metastatic prostate cancer (mPCa). Thus, timely identification of mPCa is a crucial step toward measuring and improving access to innovations that reduce PCa mortality. Yet, methods to identify patients diagnosed with mPCa remain elusive. Cancer registries provide detailed data at diagnosis but are not updated throughout treatment. This study reports on the development and validation of a natural language processing (NLP) algorithm deployed on oncology, urology, and radiology clinical notes to identify patients with a diagnosis or history of mPCa in the Department of Veterans Affairs.

Patients and methods: Using a broad set of diagnosis and histology codes, the Veterans Affairs Corporate Data Warehouse was queried to identify all Veterans with PCa. An NLP algorithm was developed to identify patients with any history or progression of mPCa. The NLP algorithm was prototyped and developed iteratively using patient notes, grouped into development, training, and validation subsets.

Results: A total of 1,144,610 Veterans were diagnosed with PCa between January 2000 and October 2020, among which 76,082 (6.6%) were identified by NLP as having mPCa at some point during their care. The NLP system performed with a specificity of 0.979 and sensitivity of 0.919.

Conclusion: Clinical documentation of mPCa is highly reliable. NLP can be leveraged to improve PCa data. When compared to other methods, NLP identified a significantly greater number of patients. NLP can be used to augment cancer registry data, facilitate research inquiries, and identify patients who may benefit from innovations in mPCa treatment.

PubMed Disclaimer

Conflict of interest statement

Brian RobisonStock and Other Ownership Interests: TherapeuticsMD Evangelia KatsoulakisStock and Other Ownership Interests: AbbVieResearch Funding: Advantagene Local Site PI Jeremy B. SheltonLeadership: DashkoConsulting or Advisory Role: Sesen BioTravel, Accommodations, Expenses: IntegraConnect Scott L. DuvallResearch Funding: Astellas Pharma Inc, AstraZeneca Pharmaceuticals LP, Boehringer Ingelheim International GmbH, Celgene Corporation, Eli Lilly and Company, Genentech Inc, Gilead Sciences Inc, GlaxoSmithKline PLC, Innocrin Pharmaceuticals Inc, Janssen Pharmaceuticals Inc, Kantar Health, Myriad Genetic Laboratories Inc, Novartis International AG, Parexel International Corporation Julie A. LynchResearch Funding: Genomic Health, AstraZeneca, Myriad Genetics, Boehringer Ingelheim, Astellas Pharma, CardioDx, JanssenNo other potential conflicts of interest were reported.

LinkOut - more resources