. 2022 Jul:6:e2200006.

doi: 10.1200/CCI.22.00006.

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Liwei Wang¹, Sunyang Fu¹, Andrew Wen¹, Xiaoyang Ruan¹, Huan He¹, Sijia Liu¹, Sungrim Moon¹, Michelle Mai¹, Irbaz B Riaz², Nan Wang³, Ping Yang⁴, Hua Xu⁵, Jeremy L Warner^{6

7}, Hongfang Liu¹

Affiliations

¹ Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN.
² Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ.
³ Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN.
⁴ Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ.
⁵ School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX.
⁶ Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN.
⁷ Department Biomedical Informatics, Vanderbilt University, Nashville, TN.

PMID: 35917480
PMCID: PMC9470142
DOI: 10.1200/CCI.22.00006

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Liwei Wang et al. JCO Clin Cancer Inform. 2022 Jul.

. 2022 Jul:6:e2200006.

doi: 10.1200/CCI.22.00006.

Authors

Affiliations

¹ Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN.
² Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ.
³ Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN.
⁴ Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ.
⁵ School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX.
⁶ Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN.
⁷ Department Biomedical Informatics, Vanderbilt University, Nashville, TN.

PMID: 35917480
PMCID: PMC9470142
DOI: 10.1200/CCI.22.00006

Abstract

Purpose: The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements.

Methods: Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards.

Results: A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists.

Conclusion: We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.

PubMed Disclaimer

Conflict of interest statement

Irbaz B. RiazThis author is a member of the JCO Clinical Cancer Informatics Editorial Board. Journal policy recused the author from having any role in the peer review of this manuscript. Hua XuEmployment: Melax Technologies IncStock and Other Ownership Interests: Melax Technologies IncConsulting or Advisory Role: More Health Inc, Hebta LLC, Melax Technologies IncPatents, Royalties, Other Intellectual Property: Receive royalties from software license from UTHealth Jeremy L. WarnerThis author is an Associate Editor for JCO Clinical Cancer Informatics. Journal policy recused the author from having any role in the peer review of this manuscript.Stock and Other Ownership Interests: HemOnc.orgConsulting or Advisory Role: Westat, Roche, Flatiron Health, Melax TechNo other potential conflicts of interest were reported.

Figures

**FIG 1.**
Synthetic analysis for mCODE and NLP methodology. (A) Distribution of data elements covered by mCODE. (B) Clustering visualization of the mCODE profiles and standardized terminologies. (C) Synthetic analysis for NLP methods, study aim, and evaluation level. AJCC, American Joint Committee on Cancer; BI-RADS, Breast Imaging Reporting and Data System; CDS, clinical decision support; ECOG, Eastern Cooperative Oncology Group; HPO, Human Phenotype Ontology; ICD-9, International Classification of Diseases (9th revision); IE, information extraction; mCODE, Minimal Common Oncology Data Elements; MPATH-Dx, Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis; NCI, National Cancer Institute; NLP, natural language processing; RadLex, Radiology Lexicon; RxNorm, no full name; UMLS, Unified Medical Language System.

**FIG 2.**
Overview of article selection process. EHR, electronic health record.

**FIG 3.**
Analysis of metadata, EHR data scope, and evaluation metrics of included articles. (A) Distribution of articles over years. (B) Distribution of articles according to the organization categories. (C) Countries of major authors (No. of articles). (D) Distribution of publication venues of articles. (E) Distribution of cancer type. (F) Distribution of metrics. (G) Distribution of document type. (H) Histogram of document type number. (I) Histogram of metric number. (J) Histogram of data time frame. AUC, area under the curve; EHR, electronic health record; F1, no full name; FN, false negative; FP, false positive; IE, information extraction; NA, not applicable; NPV, negative predictive value; PPV, positive predictive value; TN, true negative; TP, true positive; WGS, Whole Genome Sequencing.

**FIG A1.**
A heatmap of document types and targeted cancer types. Those with < 2 publications not shown.

**FIG A2.**
Comparison of standardized terminologies for data elements between the reviewed articles and mCODE. AJCC, American Joint Committee on Cancer; BI-RADS, Breast Imaging Reporting and Data System; HGOGNC, Human Genome Organization Gene Nomenclature Committee; HGVS, Human Genome Variation Society; HPO, Human Phenotype Ontology; ICD-9, International Classification of Diseases (9th revision); ICD-10, International Classification of Diseases (10th revision); ICD-O, International Classification of Diseases for Oncology; LOINC, Logical Observation Identifiers Names and Codes; MPATH-Dx, Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis; NCIT, National Cancer Institute Thesaurus; RadLex, Radiology Lexicon; RxNorm, no full name; SNOMED-CT, SNOMED Clinical Terms; UMLS, Unified Medical Language System.

**FIG A3.**
Analysis of NLP methods. NLP, natural language processing.

See this image and copyright information in PMC

Cited by

DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction.
Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. Hochheiser H, et al. medRxiv [Preprint]. 2023 Oct 26:2023.05.05.23289524. doi: 10.1101/2023.05.05.23289524. medRxiv. 2023. Update in: JCO Clin Cancer Inform. 2023 Sep;7:e2300156. doi: 10.1200/CCI.23.00156. PMID: 37205575 Free PMC article. Updated. Preprint.
Artificial Intelligence in Cancer Research: Trends, Challenges and Future Directions.
Sebastian AM, Peter D. Sebastian AM, et al. Life (Basel). 2022 Nov 28;12(12):1991. doi: 10.3390/life12121991. Life (Basel). 2022. PMID: 36556356 Free PMC article. Review.
Machine learning and deep learning tools for the automated capture of cancer surveillance data.
Hsu E, Hanson H, Coyle L, Stevens J, Tourassi G, Penberthy L. Hsu E, et al. J Natl Cancer Inst Monogr. 2024 Aug 1;2024(65):145-151. doi: 10.1093/jncimonographs/lgae018. J Natl Cancer Inst Monogr. 2024. PMID: 39102883 Free PMC article.
A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification.
Sushil M, Zack T, Mandair D, Zheng Z, Wali A, Yu YN, Quan Y, Butte AJ. Sushil M, et al. Res Sq [Preprint]. 2024 Feb 6:rs.3.rs-3914899. doi: 10.21203/rs.3.rs-3914899/v1. Res Sq. 2024. Update in: J Am Med Inform Assoc. 2024 Oct 1;31(10):2315-2327. doi: 10.1093/jamia/ocae146. PMID: 38405831 Free PMC article. Updated. Preprint.
Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review.
Sim JA, Huang X, Horan MR, Baker JN, Huang IC. Sim JA, et al. Expert Rev Pharmacoecon Outcomes Res. 2024 Apr;24(4):467-475. doi: 10.1080/14737167.2024.2322664. Epub 2024 Mar 5. Expert Rev Pharmacoecon Outcomes Res. 2024. PMID: 38383308 Free PMC article.

See all "Cited by" articles

References

1. Tayefi M, Ngo P, Chomutare T, et al. : Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip Rev Comput Stat 13:e1549, 2021
1. Bernstam EV, Warner JL, Krauss JC, et al. : Quantitating and assessing interoperability between electronic health records. J Am Med Inform Assoc 29:753-760, 2022 - PMC - PubMed
1. Kehl KL, Xu W, Lepisto E, et al. : Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin Cancer Inform 4:680-690, 2020 - PMC - PubMed
1. Wang Y, Wang L, Rastegar-Mojarad M, et al. : Clinical information extraction applications: A literature review. J Biomed Inform 77:34-49, 2018 - PMC - PubMed
1. Fu S, Chen D, He H, et al. : Clinical concept extraction: A methodology review. J Biomed Inform 17:103526, 2020 - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Affiliations

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical