Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul:6:e2200006.
doi: 10.1200/CCI.22.00006.

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Affiliations

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Liwei Wang et al. JCO Clin Cancer Inform. 2022 Jul.

Abstract

Purpose: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements.

Methods: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards.

Results: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists.

Conclusion: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.

PubMed Disclaimer

Conflict of interest statement

Irbaz B. RiazThis author is a member of the JCO Clinical Cancer Informatics Editorial Board. Journal policy recused the author from having any role in the peer review of this manuscript. Hua XuEmployment: Melax Technologies IncStock and Other Ownership Interests: Melax Technologies IncConsulting or Advisory Role: More Health Inc, Hebta LLC, Melax Technologies IncPatents, Royalties, Other Intellectual Property: Receive royalties from software license from UTHealth Jeremy L. WarnerThis author is an Associate Editor for JCO Clinical Cancer Informatics. Journal policy recused the author from having any role in the peer review of this manuscript.Stock and Other Ownership Interests: HemOnc.orgConsulting or Advisory Role: Westat, Roche, Flatiron Health, Melax TechNo other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
Synthetic analysis for mCODE and NLP methodology. (A) Distribution of data elements covered by mCODE. (B) Clustering visualization of the mCODE profiles and standardized terminologies. (C) Synthetic analysis for NLP methods, study aim, and evaluation level. AJCC, American Joint Committee on Cancer; BI-RADS, Breast Imaging Reporting and Data System; CDS, clinical decision support; ECOG, Eastern Cooperative Oncology Group; HPO, Human Phenotype Ontology; ICD-9, International Classification of Diseases (9th revision); IE, information extraction; mCODE, Minimal Common Oncology Data Elements; MPATH-Dx, Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis; NCI, National Cancer Institute; NLP, natural language processing; RadLex, Radiology Lexicon; RxNorm, no full name; UMLS, Unified Medical Language System.
FIG 2.
FIG 2.
Overview of article selection process. EHR, electronic health record.
FIG 3.
FIG 3.
Analysis of metadata, EHR data scope, and evaluation metrics of included articles. (A) Distribution of articles over years. (B) Distribution of articles according to the organization categories. (C) Countries of major authors (No. of articles). (D) Distribution of publication venues of articles. (E) Distribution of cancer type. (F) Distribution of metrics. (G) Distribution of document type. (H) Histogram of document type number. (I) Histogram of metric number. (J) Histogram of data time frame. AUC, area under the curve; EHR, electronic health record; F1, no full name; FN, false negative; FP, false positive; IE, information extraction; NA, not applicable; NPV, negative predictive value; PPV, positive predictive value; TN, true negative; TP, true positive; WGS, Whole Genome Sequencing.
FIG A1.
FIG A1.
A heatmap of document types and targeted cancer types. Those with < 2 publications not shown.
FIG A2.
FIG A2.
Comparison of standardized terminologies for data elements between the reviewed articles and mCODE. AJCC, American Joint Committee on Cancer; BI-RADS, Breast Imaging Reporting and Data System; HGOGNC, Human Genome Organization Gene Nomenclature Committee; HGVS, Human Genome Variation Society; HPO, Human Phenotype Ontology; ICD-9, International Classification of Diseases (9th revision); ICD-10, International Classification of Diseases (10th revision); ICD-O, International Classification of Diseases for Oncology; LOINC, Logical Observation Identifiers Names and Codes; MPATH-Dx, Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis; NCIT, National Cancer Institute Thesaurus; RadLex, Radiology Lexicon; RxNorm, no full name; SNOMED-CT, SNOMED Clinical Terms; UMLS, Unified Medical Language System.
FIG A3.
FIG A3.
Analysis of NLP methods. NLP, natural language processing.

Similar articles

Cited by

References

    1. Tayefi M, Ngo P, Chomutare T, et al. : Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip Rev Comput Stat 13:e1549, 2021
    1. Bernstam EV, Warner JL, Krauss JC, et al. : Quantitating and assessing interoperability between electronic health records. J Am Med Inform Assoc 29:753-760, 2022 - PMC - PubMed
    1. Kehl KL, Xu W, Lepisto E, et al. : Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin Cancer Inform 4:680-690, 2020 - PMC - PubMed
    1. Wang Y, Wang L, Rastegar-Mojarad M, et al. : Clinical information extraction applications: A literature review. J Biomed Inform 77:34-49, 2018 - PMC - PubMed
    1. Fu S, Chen D, He H, et al. : Clinical concept extraction: A methodology review. J Biomed Inform 17:103526, 2020 - PMC - PubMed

Publication types