Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 1;5(10):1421-1429.
doi: 10.1001/jamaoncol.2019.1800.

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports

Affiliations

Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports

Kenneth L Kehl et al. JAMA Oncol. .

Abstract

Importance: A rapid learning health care system for oncology will require scalable methods for extracting clinical end points from electronic health records (EHRs). Outside of clinical trials, end points such as cancer progression and response are not routinely encoded into structured data.

Objective: To determine whether deep natural language processing can extract relevant cancer outcomes from radiologic reports, a ubiquitous but unstructured EHR data source.

Design, setting, and participants: A retrospective cohort study evaluated 1112 patients who underwent tumor genotyping for a diagnosis of lung cancer and participated in the Dana-Farber Cancer Institute PROFILE study from June 26, 2013, to July 2, 2018.

Exposures: Patients were divided into curation and reserve sets. Human abstractors applied a structured framework to radiologic reports for the curation set to ascertain the presence of cancer and changes in cancer status over time (ie, worsening/progressing vs improving/responding). Deep learning models were then trained to capture these outcomes from report text and subsequently evaluated in a 10% held-out test subset of curation patients. Cox proportional hazards regression models compared human and machine curations of disease-free survival, progression-free survival, and time to improvement/response in the curation set, and measured associations between report classification and overall survival in the curation and reserve sets.

Main outcomes and measures: The primary outcome was area under the receiver operating characteristic curve (AUC) for deep learning models; secondary outcomes were time to improvement/response, disease-free survival, progression-free survival, and overall survival.

Results: A total of 2406 patients were included (mean [SD] age, 66.5 [10.8] years; 1428 female [59.7%]; 2170 [90.2%] white). Radiologic reports (n = 14 230) were manually reviewed for 1112 patients in the curation set. In the test subset (n = 109), deep learning models identified the presence of cancer, improvement/response, and worsening/progression with accurate discrimination (AUC >0.90). Machine and human curation yielded similar measurements of disease-free survival (hazard ratio [HR] for machine vs human curation, 1.18; 95% CI, 0.71-1.95); progression-free survival (HR, 1.11; 95% CI, 0.71-1.71); and time to improvement/response (HR, 1.03; 95% CI, 0.65-1.64). Among 15 000 additional reports for 1294 reserve set patients, algorithm-detected cancer worsening/progression was associated with decreased overall survival (HR for mortality, 4.04; 95% CI, 2.78-5.85), and improvement/response was associated with increased overall survival (HR, 0.41; 95% CI, 0.22-0.77).

Conclusions and relevance: Deep natural language processing appears to speed curation of relevant cancer outcomes and facilitate rapid learning from EHR data.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Kehl reports serving as a consultant to Aetion Inc. Dr Nishino reports serving as a consultant to Toshiba Medical Systems, WorldCare Clinical, and Daiichi Sankyo; holding research grants from the Merck Investigator Studies Program, Toshiba Medical Systems, and AstraZeneca; and receiving honoraria from Bayer and Roche. Dr Van Allen reports serving as a consultant/advisor to Tango Therapeutics, Invitae, Genome Medical, Dynamo, Foresite Capital, and Illumina; holding research support from Novartis and Bristol-Myers Squibb; and holding equity in Syapse, Genome Medical, Tango, and Microsoft Corp. Dr Johnson reports receiving postmarketing royalties for epidermal growth factor receptor testing, paid by Dana-Farber Cancer Institute, and receiving research support from Novartis and Toshiba. Dr Schrag reports serving as a consultant/advisor to Cureus and Pfizer, receiving research funding from the American Association of Cancer Research; serving as an editor and receiving personal fees from JAMA outside the related work; holding equity in mixed biotech funds including Invesco QQQ and holding a trademark and having financial interest in the PRISSMM phenomic data standard. The other authors have no conflicts of interest to disclose.

Figures

Figure 1.
Figure 1.. Cohort Derivation
Derivation of curation set, curation subsets, and reserve set.
Figure 2.
Figure 2.. Comparison of Human and Machine Curation of Imaging Reports
Shaded bands represent 95% CIs. The cumulative rate of response was calculated subject to competing risks of death. Disease-free survival was measured among patients with early-stage disease at diagnosis. The end point was defined as the time from diagnosis until death or the first imaging report curated as representing disease worsening/progression, whichever came first. Progression-free survival was measured among patients who received systemic therapy documented as palliative in intent. The end point was defined as the time from initiation of first palliative-intent systemic therapy at the Dana-Farber Cancer Institute until death or the first imaging report indicating worsening/progressive disease, whichever came first. Response was measured among patients who received systemic therapy documented as palliative in intent. The end point was defined as the time from initiation of first palliative-intent systemic therapy at the Dana-Farber Cancer Institute until the first imaging report indicating improving/responding disease, subject to competing risks of death.
Figure 3.
Figure 3.. Association Between Machine-Curated Imaging Report Outcomes and Overall Survival
All results were derived from Cox proportional hazards regression models in which the imaging finding was included as a time-varying covariate. Hazard ratios therefore represent the relative hazard of mortality at any given time if a particular imaging finding was present on the most recent prior scan vs if it was not. In models analyzing the outcome of any cancer, the presence of cancer on an imaging report was the sole independent variable. In models analyzing the outcome of response and progression, each was included as a binary variable. aCohort included 1294 reserve set patients (with imaging reports on 11 946 dates predating survival cutoff) who did not undergo human curation but for whom overall survival data were available. Survival time was indexed at the date of the first imaging report available. bCohort included 1112 curation set patients (with imaging reports on 11 091 dates predating survival cutoff) who underwent human and machine curation. Survival time was indexed at the date of the first imaging report available. cCohort included 395 reserve set patients (with imaging reports on 4092 dates preceding survival cutoff) who did not undergo human curation but for whom overall survival data were available and initiated a treatment plan recorded as palliative in intent. Survival time was indexed at the date of first palliative-intent systemic therapy. dCohort included 501 curation set patients (with imaging reports on 4941 dates predating survival cutoff) who underwent human and machine curation and initiated a treatment plan recorded as palliative in intent. Survival time was indexed at the date of first palliative-intent systemic therapy.

Comment in

  • doi: 10.1001/jamaoncol.2019.1799

References

    1. Washington V, DeSalvo K, Mostashari F, Blumenthal D. The HITECH Era and the path forward. N Engl J Med. 2017;377(10):904-906. doi: 10.1056/NEJMp1703370 - DOI - PubMed
    1. Sherman RE, Anderson SA, Dal Pan GJ, et al. Real-world evidence—what is it and what can it tell us? N Engl J Med. 2016;375(23):2293-2297. doi: 10.1056/NEJMsb1609216 - DOI - PubMed
    1. Miller AB, Hoogstraten B, Staquet M, Winkler A. Reporting results of cancer treatment. Cancer. 1981;47(1):207-214. doi: 10.1002/1097-0142(19810101)47:1<207::AID-CNCR2820470134>3.0.CO;2-6 - DOI - PubMed
    1. Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009;45(2):228-247. doi: 10.1016/j.ejca.2008.10.026 - DOI - PubMed
    1. Therasse P, Arbuck SG, Eisenhauer EA, et al. New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst. 2000;92(3):205-216. doi: 10.1093/jnci/92.3.205 - DOI - PubMed