Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Dec:100:103301.
doi: 10.1016/j.jbi.2019.103301. Epub 2019 Oct 4.

A frame semantic overview of NLP-based information extraction for cancer-related EHR notes

Affiliations
Review

A frame semantic overview of NLP-based information extraction for cancer-related EHR notes

Surabhi Datta et al. J Biomed Inform. 2019 Dec.

Abstract

Objective: There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer.

Methods: We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps.

Results: Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis.

Conclusion: The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.

Keywords: Cancer; Deep phenotyping; Electronic health records; Frame semantics; Natural language processing; Scoping review.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1.
Fig. 1.
PRISMA diagram for study selection.
Fig. 2.
Fig. 2.
Frames and their relations as expressed in literature. Frames with similar purpose (e.g., diagnosis, imaging, assessment) are assigned the same colors.

References

    1. Denny JC, Peterson JF, Choma NN, Xu H, Miller RA, Bastarache L, Peterson NB, Extracting timing and status descriptors for colonoscopy testing from electronic medical records, J. Am. Med. Inform. Assoc 17 (2010) 383–388, 10.1136/jamia.2010.004804. - DOI - PMC - PubMed
    1. AAlAbdulsalam AK, Garvin JH, Redd A, Carter ME, Sweeny C, Meystre SM, Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry, AMIA Jt. Summits Transl. Sci. Proceedings.AMIA Jt. Summits Transl. Sci 2017 (2018) 16–25. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961766/. - PMC - PubMed
    1. Gregg JR, Lang M, Wang LL, Resnick MJ, Jain SK, Warner JL, Barocas DA, Automating the determination of prostate cancer risk strata from electronic medical records, JCO Clin. Cancer Inform 2017 (2017) 1–8, 10.1200/CCI.16.00045. - DOI - PMC - PubMed
    1. Schroeck FR, Patterson OV, Alba PR, Pattison EA, Seigne JD, DuVall SL, Robertson DJ, Sirovich B, Goodney PP, Development of a natural language processing engine to generate bladder cancer pathology data for health services research, Urology 110 (2017) 84–91 S0090–4295(17)30966–4 [pii]. - PMC - PubMed
    1. Napolitano G, Fox C, Middleton R, Connolly D, Pattern-based information extraction from pathology reports for cancer registration, Cancer Causes Control 21 (2010) 1887–1894, 10.1007/s10552-010-9616-4. - DOI - PubMed

Publication types

LinkOut - more resources