Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 3;23(1):172-180.e3.
doi: 10.1016/j.celrep.2018.03.046.

Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas

Collaborators, Affiliations

Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas

Gregory P Way et al. Cell Rep. .

Abstract

Precision oncology uses genomic evidence to match patients with treatment but often fails to identify all patients who may respond. The transcriptome of these "hidden responders" may reveal responsive molecular states. We describe and evaluate a machine-learning approach to classify aberrant pathway activity in tumors, which may aid in hidden responder identification. The algorithm integrates RNA-seq, copy number, and mutations from 33 different cancer types across The Cancer Genome Atlas (TCGA) PanCanAtlas project to predict aberrant molecular states in tumors. Applied to the Ras pathway, the method detects Ras activation across cancer types and identifies phenocopying variants. The model, trained on human tumors, can predict response to MEK inhibitors in wild-type Ras cell lines. We also present data that suggest that multiple hits in the Ras pathway confer increased Ras activity. The transcriptome is underused in precision oncology and, combined with machine learning, can aid in the identification of hidden responders.

Keywords: Gene expression; HRAS; KRAS; NF1; NRAS; Ras; TCGA; drug sensitivity; machine learning; pan-cancer.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS

Michael Seiler, Peter G. Smith, Ping Zhu, Silvia Buonamici, and Lihua Yu are employees of H3 Biomedicine, Inc. Parts of this work are the subject of a patent application: WO2017040526 titled “Splice variants associated with neo-morphic sf3b1 mutants.” Shouyoung Peng, Anant A. Agrawal, James Palacino, and Teng Teng are employees of H3 Biomedicine, Inc. Andrew D. Cherniack, Ashton C. Berger, and Galen F. Gao receive research support from Bayer Pharmaceuticals. Gordon B. Mills serves on the External Scientific Review Board of Astrazeneca. Anil Sood is on the Scientific Advisory Board for Kiyatec and is a shareholder in BioPath. Jonathan S. Serody receives funding from Merck, Inc. Kyle R. Covington is an employee of Castle Biosciences, Inc. Preethi H. Gunaratne is founder, CSO, and shareholder of NextmiRNA Therapeutics. Christina Yau is a part-time employee/consultant at NantOmics. Franz X. Schaub is an employee and shareholder of SEngine Precision Medicine, Inc. Carla Grandori is an employee, founder, and shareholder of SEngine Precision Medicine, Inc. Robert N. Eisenman is a member of the Scientific Advisory Boards and shareholder of Shenogen Pharma and Kronos Bio. Daniel J. Weisenberger is a consultant for Zymo Research Corporation. Joshua M. Stuart is the founder of Five3 Genomics and shareholder of NantOmics. Marc T. Goodman receives research support from Merck, Inc. Andrew J. Gentles is a consultant for Cibermed. Charles M. Perou is an equity stock holder, consultant, and Board of Directors member of BioClassifier and GeneCentric Diagnostics and is also listed as an inventor on patent applications on the Breast PAM50 and Lung Cancer Subtyping assays. Matthew Meyerson receives research support from Bayer Pharmaceuticals; is an equity holder in, consultant for, and Scientific Advisory Board chair for OrigiMed; and is an inventor of a patent for EGFR mutation diagnosis in lung cancer, licensed to LabCorp. Eduard Porta-Pardo is an inventor of a patent for domainXplorer. Han Liang is a shareholder and scientific advisor of Precision Scientific and Eagle Nebula. Da Yang is an inventor on a pending patent application describing the use of antisense oligonucleotides against specific lncRNA sequence as diagnostic and therapeutic tools. Yonghong Xiao was an employee and shareholder of TESARO, Inc. Bin Feng is an employee and shareholder of TESARO, Inc. Carter Van Waes received research funding for the study of IAP inhibitor ASTX660 through a Cooperative Agreement between NIDCD, NIH, and Astex Pharmaceuticals. Raunaq Malhotra is an employee and shareholder of Seven Bridges, Inc. Peter W. Laird serves on the Scientific Advisory Board for AnchorDx. Joel Tepper is a consultant at EMD Serono. Kenneth Wang serves on the Advisory Board for Boston Scientific, Microtech, and Olympus. Andrea Califano is a founder, shareholder, and advisory board member of DarwinHealth, Inc. and a shareholder and advisory board member of Tempus, Inc. Toni K. Choueiri serves as needed on advisory boards for Bristol-Myers Squibb, Merck, and Roche. Lawrence Kwong receives research support from Array BioPharma. Sharon E. Plon is a member of the Scientific Advisory Board for Baylor Genetics Laboratory. Beth Y. Karlan serves on the Advisory Board of Invitae.

Figures

Figure 1
Figure 1. Framing the Algorithm and Integration Tasks
(A) RNA-seq data (X) is multiplied by a vector of gene weights (w) where the optimization task is to find the optimal w to correctly classify the pathway status matrix (y). We train the model with the train partition and evaluate performance on a held-out test set. (B) The status matrix, y, is constructed by integrating mutations and copy number alterations (CNA). We consider activating or loss-of-function mutations and high copy number gain and deep copy number loss for oncogenes and tumor-suppressor genes, respectively. Black squares indicate aberrant events. For the Ras classifier, we used non-silent somatic mutations and high copy gains in the oncogenes KRAS, NRAS, and HRAS.
Figure 2
Figure 2. Evaluating Machine-Learning Classification of Ras Activation
(A) Cancer-type-specific percentages of Ras aberration by copy number gain and deleterious mutation in KRAS, HRAS, or NRAS. The colored squares indicate whether the cancer type was included in model training. (B) Predicting Ras pathway activation metrics. The gray lines represent classifier predictions on a randomly shuffled gene expression matrix. Left: receiver operating characteristic (ROC) curve and area under the ROC (AUROC) curve given for training, testing, and cross-validation (CV) sets. The dotted navy line represents a hypothetical random classifier. Right: precision recall (PR) curve and corresponding area under the PR (AUPR) curve for each evaluation set. (C) Sparse classifier coefficients indicate which genes impact classifier performance. log10_mut represents tumor-specific non-silent mutation rate. (D) Cancer-type-specific performance for the pan-cancer model compared to separate models trained on each cancer type independently. See also Figures S2 and S3.
Figure 3
Figure 3. Cell-Line Predictions of Ras Activity
(A) Ras classifier trained on PanCanAtlas tumors applied to a dataset of small airway epithelial cells (GEO: GSE94937). The mutant cells included a stably expressed KRAS G12V mutation. (B) Ras classifier trained on PanCanAtlas tumors applied to 737 cell lines from The Cancer Cell Line Encyclopedia (CCLE). Cell lines with KRAS, HRAS, or NRAS mutations are indicated in the right boxes, and wild-type tumors are indicated in the left boxes. Scores for cell lines with BRAF mutations (green) and wild-type BRAF (gold) are also shown. (C and D) Drug activity area for (C) selumetinib (AZD6244) and (D) PD-0325901 compared against Ras classifier scores for 388 CCLE cell lines with both gene expression and pharmacologic profiling data. Cell lines with mutant (orange) or wild-type (blue) KRAS, HRAS, and NRAS are indicated. The best fit lines, SE estimates, correlation coefficients, and p values are shown separately for cell lines with mutant or wild-type Ras.
Figure 4
Figure 4. Ras Activation across Ras Variants and Alternative Ras Pathway Members
(A) Cross-validation area under the receiver operating characteristic curve for predicting NF1 inactivation. Within and pan-cancer models are classifiers trained to detect NF1 inactivation. The Ras model is the classifier trained in Figure 2. The pan-cancer NF1 classifier is shown in Figure S3. (B) Ras classifier scores for samples with oncogenic or unconfirmed variants in KRAS, HRAS, and NRAS. Variant oncogenicity designations are based on curation (see STAR Methods). (C and D) Ras classifier scores stratified by Ras activity (KRAS, NRAS, HRAS) status and number of (C) aberrant mutations or (D) copy number alterations in other Ras pathway members. The two rows of numbers above each graph indicate number of samples in each group (top) and percentage of samples assigned to active Ras (bottom). See also Figure S3.

References

    1. Babur Ö, Gönen M, Aksoy BA, Schultz N, Ciriello G, Sander C, Demir E. Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations. Genome Biol. 2015;16:45. - PMC - PubMed
    1. Bahceci I, Dogrusoz U, La KC, Babur Ö, Gao J, Schultz N. PathwayMapper: a collaborative visual web editor for cancer pathways and genomic data. Bioinformatics. 2017;33:2238–2240. - PMC - PubMed
    1. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. - PMC - PubMed
    1. Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, et al. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2006;439:353–357. - PubMed
    1. Boasberg PD, Redfern CH, Daniels GA, Bodkin D, Garrett CR, Ricart AD. Pilot study of PD-0325901 in previously treated patients with advanced melanoma, breast cancer, and colon cancer. Cancer Chemother Pharmacol. 2011;68:547–552. - PubMed

Publication types