Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep:7:e2300156.
doi: 10.1200/CCI.23.00156.

DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction

Affiliations

DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction

Harry Hochheiser et al. JCO Clin Cancer Inform. 2023 Sep.

Abstract

Purpose: Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting.

Methods: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools.

Results: API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool.

Conclusion: The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Harry Hochheiser

Research Funding: Philips Respironics (Inst)

Eric B. Durbin

Travel, Accommodations, Expenses: Caris Life Sciences, Inc

Isaac Hands

Consulting or Advisory Role: Perthera

Travel, Accommodations, Expenses: Perthera

Ramakanth Kavuluru

Stock and Other Ownership Interests: Clover Health, Teladoc

Jeremy L. Warner

This author is the Editor-in-Chief of JCO Clinical Cancer Informatics. Journal policy recused the author from having any role in the peer review of this manuscript.

Stock and Other Ownership Interests: HemOnc.org

Consulting or Advisory Role: Westat, Flatiron Health, Melax Tech

Other Relationship: HemOnc.org

No other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
The original data abstraction screen from the Cancer Patient Data Management System, augmented to support suggested data items as extracted from clinical text by DeepPhe-CR: (A) document-level topography, histology, behavior, laterality, and grade values are shown color-coded and labeled with appropriate ICD-O/NAACCR codes; (B) suggested items and demographics can be copied to the in-progress abstract with a single click; (C) the clinical text is highlighted with color codes, indicating text spans associated with the five summary values displayed in the suggestion area above (A). Thus, “Right breast cancer” is associated with topography C50.9, “breast” with histology 8500, and “right” with laterality 1. ICD, International Classification of Diseases; NAACCR, North American Association of Central Cancer Registries.

Update of

Similar articles

Cited by

References

    1. Savova GK, Danciu I, Alamudun F, et al. : Use of natural language processing to extract clinical cancer phenotypes from electronic medical records. Cancer Res 79:5463-5470, 2019 - PMC - PubMed
    1. Wang L, Fu S, Wen A, et al. : Assessment of electronic health record for cancer research and patient care through a scoping review of cancer natural language processing. JCO Clin Cancer Inform 10.1200/CCI.22.00006 - DOI - PMC - PubMed
    1. Zeng J, Banerjee I, Henry AS, et al. : Natural language processing to identify cancer treatments with electronic medical records. JCO Clin Cancer Inform 10.1200/CCI.20.00173 - DOI - PubMed
    1. Karimi YH, Blayney DW, Kurian AW, et al. : Development and use of natural language processing for identification of distant cancer recurrence and sites of distant recurrence using unstructured electronic health record data. JCO Clin Cancer Inform 10.1200/CCI.20.00165 - DOI - PMC - PubMed
    1. Bitterman D, Miller T, Harris D, et al. : Extracting relations between radiotherapy treatment details, in Proceedings of the 3rd Clinical Natural Language Processing Workshop. Online: Association for Computational Linguistics, 2020. pp 194-200. https://aclanthology.org/2020.clinicalnlp-1.21

Publication types

LinkOut - more resources