Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 15;12(1):7304.
doi: 10.1038/s41467-021-27358-6.

Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset

Affiliations

Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset

Kenneth L Kehl et al. Nat Commun. .

Abstract

To accelerate cancer research that correlates biomarkers with clinical endpoints, methods are needed to ascertain outcomes from electronic health records at scale. Here, we train deep natural language processing (NLP) models to extract outcomes for participants with any of 7 solid tumors in a precision oncology study. Outcomes are extracted from 305,151 imaging reports for 13,130 patients and 233,517 oncologist notes for 13,511 patients, including patients with 6 additional cancer types. NLP models recapitulate outcome annotation from these documents, including the presence of cancer, progression/worsening, response/improvement, and metastases, with excellent discrimination (AUROC > 0.90). Models generalize to cancers excluded from training and yield outcomes correlated with survival. Among patients receiving checkpoint inhibitors, we confirm that high tumor mutation burden is associated with superior progression-free survival ascertained using NLP. Here, we show that deep NLP can accelerate annotation of molecular cancer datasets with clinically meaningful endpoints to facilitate discovery.

PubMed Disclaimer

Conflict of interest statement

Dr. Kehl reports serving as a consultant/advisor to Aetion, receiving funding from the American Association for Cancer Research related to this work, and receiving honoraria from Roche and IBM. Dr. Schrag reports compensation from JAMA for serving as an Associate Editor and from Pfizer for giving a talk at a symposium. She has received research funding from the American Association for Cancer Research related to this work and research funding from GRAIL for serving as the site-PI of a clinical trial. Unrelated to this work, Dr. Choueiri reports serving on research/advisory boards and receiving honoraria from AstraZeneca, Aravive, Aveo, Bayer, Bristol Myers-Squibb, Eisai, EMD Serono, Exelixis, GlaxoSmithKline, IQVA, Ipsen, Lilly, Merck, Novartis, Pfizer, Roche, Sanofi/Aventis, Takeda, Tempest, Up-To-Date, CME events (Peerview, OncLive and others). Dr. Van Allen reports serving in advisory/consulting roles to Tango Therpeutics, Genome Medical, Invitae, Enara Bio, Janssen, Manifold Bio, and Monte Rosa; receiving research support from Novartis and BMS; holding equity in Tango Therapeutics, Genome Medical, Syapse, Enara Bio, Manifold Bio, Microsoft, and Monte Rosa; and receiving travel reimbursement from Roche/Genentech. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Example of a clinico-genomic analysis based on outcomes ascertained using natural language processing models: Association between TMB and progression-free survival after initation of immunotherapy.
High tumor mutational burden defined as >=20 mutations per megabase. Results in this figure represent unadjusted Kaplan-Meier curves. Events were recorded using the “PFS-I-and-M” endpoint, defined as the earlier of death, or the time by which both a medical oncologist note and an imaging report had described cancer progression/worsening. Progression/worsening was defined using natural language processing models applied to imaging reports and medical oncologist notes. Survival curves were not adjusted for left truncation, since progression events were possible prior to genomic testing and cohort eligibility.

References

    1. Garraway LA, Verweij J, Ballman KV. Precision oncology: an overview. J. Clin. Oncol. 2013;31:1803–1805. doi: 10.1200/JCO.2013.49.4799. - DOI - PubMed
    1. AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Disco. 2017;7:818–831. doi: 10.1158/2159-8290.CD-17-0151. - DOI - PMC - PubMed
    1. Zehir A, et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 2017;23:703–713. doi: 10.1038/nm.4333. - DOI - PMC - PubMed
    1. Sholl LM, et al. Institutional implementation of clinical tumor profiling on an unselected cancer population. JCI insight. 2016;1:e87062. doi: 10.1172/jci.insight.87062. - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research Network. Weinstein JN, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. - DOI - PMC - PubMed

Publication types