DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction
- PMID: 38113411
- PMCID: PMC10752457
- DOI: 10.1200/CCI.23.00156
DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction
Abstract
Purpose: Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting.
Methods: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools.
Results: API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool.
Conclusion: The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.
Conflict of interest statement
The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to
Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (
This author is the Editor-in-Chief of
No other potential conflicts of interest were reported.
Figures

Update of
-
DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction.medRxiv [Preprint]. 2023 Oct 26:2023.05.05.23289524. doi: 10.1101/2023.05.05.23289524. medRxiv. 2023. Update in: JCO Clin Cancer Inform. 2023 Sep;7:e2300156. doi: 10.1200/CCI.23.00156. PMID: 37205575 Free PMC article. Updated. Preprint.
Similar articles
-
DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction.medRxiv [Preprint]. 2023 Oct 26:2023.05.05.23289524. doi: 10.1101/2023.05.05.23289524. medRxiv. 2023. Update in: JCO Clin Cancer Inform. 2023 Sep;7:e2300156. doi: 10.1200/CCI.23.00156. PMID: 37205575 Free PMC article. Updated. Preprint.
-
Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084. JCO Clin Cancer Inform. 2019. PMID: 31314550 Free PMC article.
-
Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research.Eur Urol Focus. 2024 Mar;10(2):279-287. doi: 10.1016/j.euf.2024.01.009. Epub 2024 Jan 25. Eur Urol Focus. 2024. PMID: 38278710
-
A frame semantic overview of NLP-based information extraction for cancer-related EHR notes.J Biomed Inform. 2019 Dec;100:103301. doi: 10.1016/j.jbi.2019.103301. Epub 2019 Oct 4. J Biomed Inform. 2019. PMID: 31589927 Free PMC article. Review.
-
Natural Language Processing in Nephrology.Adv Chronic Kidney Dis. 2022 Sep;29(5):465-471. doi: 10.1053/j.ackd.2022.07.001. Adv Chronic Kidney Dis. 2022. PMID: 36253030 Free PMC article. Review.
Cited by
-
Using Large Language Models to Automate Data Extraction From Surgical Pathology Reports: Retrospective Cohort Study.JMIR Form Res. 2025 Apr 7;9:e64544. doi: 10.2196/64544. JMIR Form Res. 2025. PMID: 40194317 Free PMC article.
References
-
- Wang L, Fu S, Wen A, et al. : Assessment of electronic health record for cancer research and patient care through a scoping review of cancer natural language processing. JCO Clin Cancer Inform 10.1200/CCI.22.00006 - DOI - PMC - PubMed
-
- Zeng J, Banerjee I, Henry AS, et al. : Natural language processing to identify cancer treatments with electronic medical records. JCO Clin Cancer Inform 10.1200/CCI.20.00173 - DOI - PubMed
-
- Karimi YH, Blayney DW, Kurian AW, et al. : Development and use of natural language processing for identification of distant cancer recurrence and sites of distant recurrence using unstructured electronic health record data. JCO Clin Cancer Inform 10.1200/CCI.20.00165 - DOI - PMC - PubMed
-
- Bitterman D, Miller T, Harris D, et al. : Extracting relations between radiotherapy treatment details, in Proceedings of the 3rd Clinical Natural Language Processing Workshop. Online: Association for Computational Linguistics, 2020. pp 194-200. https://aclanthology.org/2020.clinicalnlp-1.21
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical