Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan;6(1):86-101.
doi: 10.1038/s43018-024-00883-1. Epub 2025 Jan 9.

Prospective validation of ORACLE, a clonal expression biomarker associated with survival of patients with lung adenocarcinoma

Collaborators, Affiliations

Prospective validation of ORACLE, a clonal expression biomarker associated with survival of patients with lung adenocarcinoma

Dhruva Biswas et al. Nat Cancer. 2025 Jan.

Abstract

Human tumors are diverse in their natural history and response to treatment, which in part results from genetic and transcriptomic heterogeneity. In clinical practice, single-site needle biopsies are used to sample this diversity, but cancer biomarkers may be confounded by spatiogenomic heterogeneity within individual tumors. Here we investigate clonally expressed genes as a solution to the sampling bias problem by analyzing multiregion whole-exome and RNA sequencing data for 450 tumor regions from 184 patients with lung adenocarcinoma in the TRACERx study. We prospectively validate the survival association of a clonal expression biomarker, Outcome Risk Associated Clonal Lung Expression (ORACLE), in combination with clinicopathological risk factors, and in stage I disease. We expand our mechanistic understanding, discovering that clonal transcriptional signals are detectable before tissue invasion, act as a molecular fingerprint for lethal metastatic clones and predict chemotherapy sensitivity. Lastly, we find that ORACLE summarizes the prognostic information encoded by genetic evolutionary measures, including chromosomal instability, as a concise 23-transcript assay.

Trial registration: ClinicalTrials.gov NCT01888601.

PubMed Disclaimer

Conflict of interest statement

Competing interests: D.B. reports personal fees from NanoString and AstraZeneca and has a patent PCT/GB2020/050221 issued on methods for cancer prognostication. Y.W. consults for E15 VC and Prokarium. D.A.M. reports speaker fees from AstraZeneca, Eli Lilly, BMS and Takeda, consultancy fees from AstraZeneca, Thermo Fisher, Takeda, Amgen, Janssen, MIM Software, Bristol Myers Squibb and Eli Lilly and has received educational support from Takeda and Amgen. S.C.T. has acted as a consultant for Revolution Medicines. J.D. has acted as a consultant for AstraZeneca, Jubilant, Theras, Roche and Vividion and has funded research agreements with Bristol Myers Squibb, Revolution Medicines, Novartis, Vividion and AstraZeneca. M.J.-H. has consulted for Astex Pharmaceutical and Achilles Therapeutics, and is a member of, the Achilles Therapeutics Scientific Advisory Board and Steering Committee, has received speaker honoraria from Pfizer, Astex Pharmaceuticals, Oslo Cancer Cluster, Bristol Myers Squibb and Genentech. M.J.-H. is listed as a co-inventor on a European patent application relating to methods to detect lung cancer PCT/US2017/028013), this patent has been licensed to commercial entities and, under terms of employment, M.J.-H. is due a share of any revenue generated from such license(s), and is also listed as a co-inventor on the GB priority patent application (GB2400424.4) with title: Treatment and Prevention of Lung Cancer. N.J.B. is listed as a co-inventor on a patent to identify responders to cancer treatment (PCT/GB2018/051912), has a patent application (PCT/GB2020/050221) on methods for cancer prognostication and a patent on methods for predicting anti-cancer response (US14/466,208). C.S. acknowledges grant support from AstraZeneca, Boehringer-Ingelheim, BMS, Pfizer, Roche-Ventana, Invitae (previously Archer Dx (collaboration in minimal residual disease sequencing technologies)) and Ono Pharmaceutical. C.S. is an AstraZeneca Advisory Board member and Chief Investigator for the AZ MeRmaiD 1 and 2 clinical trials and is also Co-Chief Investigator of the NHS Galleri trial funded by GRAIL and a paid member of GRAIL’s SAB. He receives consultant fees from Achilles Therapeutics (also a SAB member), Bicycle Therapeutics (also a SAB member), Genentech, Medicxi, Roche Innovation Centre–Shanghai, Metabomed (until July 2022), and the Sarah Cannon Research Institute. C.S. had stock options in Apogen Biotechnologies and GRAIL until June 2021, currently has stock options in Epic Bioscience and Bicycle Therapeutics and has stock options and is co-founder of Achilles Therapeutics. C.S. is an inventor on a European patent application relating to an assay technology to detect tumor recurrence (PCT/ GB2017/053289), the patent has been licensed to commercial entities and under his terms of employment, C.S. is due a revenue share of any revenue generated from such license(s). C.S. holds patents relating to targeting neoantigens (PCT/EP2016/059401), identifying patient responses to immune checkpoint blockade (PCT/EP2016/071471), determining HLA LOH (PCT/ GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221), identifying patients who respond to cancer treatment (PCT/GB2018/051912), a US patent relating to detecting tumor mutations (PCT/US2017/28013), methods for lung cancer detection (US20190106751A1) and both a European and US patent related to identifying indel mutation targets (PCT/GB2018/051892) and is a co-inventor on a patent application to determine methods and systems for tumor monitoring (PCT/EP2022/077987). C.S. is a named inventor on a provisional patent related to a ctDNA detection algorithm. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Prospective validation of tumor sampling bias.
a, The sampling bias problem is illustrated for a lung tumor. Here, a prognostic biomarker classifies tumor regions as high risk (red) or low risk (blue). The diagnostic biopsy samples from only one tumor region (indicated by square with region number). Therefore, using the conventional strategy, the readout of molecular risk for this patient will depend entirely on where the biopsy needle is placed. Four tissue-based solutions to mitigate sampling bias are tabulated, comparing their tissue and assay requirements. Sampling and testing ‘all’ tumor regions bypasses the sampling problem, but this is the most expensive in terms of tissue and technology costs. A multibiopsy strategy, sampling a limited number of regions (four regions have been suggested for lung cancer), brings down the cost while tending to capture intratumor variability. ‘Blending’ the entire tumor, and applying one test to an aliquot from the homogenized mixture, has the same cost as testing a single diagnostic biopsy but requires pathology access to the full tumor. In theory, the ‘clonal’ strategy is the most economical, providing a stable molecular readout from a single diagnostic biopsy. Created in BioRender.com. b, A dot plot showing the distribution of ORACLE risk scores in the TRACERx validation cohort (n = 122 patients with stage I–III LUAD with multiple regions available). Patients were classified into concordant low-risk (blue), concordant high-risk (red) and discordant risk (gray) groups by ORACLE. The association between ORACLE risk class and TNM stages was tested by chi-squared goodness-of-fit test in Extended Data Fig. 2b. c, Pie charts showing the percentages of risk groups classified by ORACLE and the other six signatures. d, An overview of prognostic signature ranking across four different metrics for tumor sampling bias. The mean rank of all tumor sampling bias was calculated for each signature. The name of each signature is indicated (with the number of signature genes). Source data
Fig. 2
Fig. 2. Prospective validation of survival association.
a, A Kaplan–Meier plot showing the OS association among patients at low risk (blue), high risk (red) and discordant risk (gray) classified by ORACLE in the TRACERx validation cohort (n = 158 patients with stage I–III LUAD). Statistical significance was tested with a two-sided log-rank test, P = 0.0034. b, The prognostic value of ORACLE adjusted for known clinicopathological risk factors in the TRACERx validation cohort (n = 158 patients with stage I–III LUAD). Multivariable Cox analysis was performed incorporating the ORACLE mean risk score, patient sex, patient age, pack-years (smoking packs and duration), adjuvant treatment status, tumor stage (TNM 8th edition) and histologic grade. P values or baseline (Ref.) are shown for each predictor in the last column. The center box indicating HR and the error bars indicating 95% CIs are shown for each predictor on a natural log scale. IMA, invasive mucinous adenocarcinoma. c, The distribution of prognostic associations for ORACLE across simulation runs of a pseudo-single-biopsy cohort. One region is randomly sampled for each tumor followed by a Cox regression analysis of ORACLE risk score against OS. The density plot shows the distribution of log-scaled HR values across 1,000 simulations. d, The prognostic value of ORACLE for patients with stage I (TNM 8th edition) LUAD in the TRACERx validation cohort (n = 70). The Kaplan–Meier plots show the OS association according to clinical staging (TNM 8th edition) (P = 0.43) and ORACLE (P = 0.003). Statistical significance was tested with a two-sided log-rank test. Source data
Fig. 3
Fig. 3. ORACLE as a marker of invasive and metastatic potential.
a,b, Kaplan–Meier plots showing the lung-cancer-specific survival association among patients at low risk (blue), high risk (red) and discordant risk (gray) classified by ORACLE in the TRACERx validation cohort (n = 158 patients with stage I–III LUAD, P = 0.011) (a) and in stage I subgroup (n = 70 patients with stage I LUAD, P = 0.0028) (b). Statistical significance was tested with a two-sided log-rank test. c, ORACLE risk scores in 8 histological stages in a published dataset of preinvasive lung lesions (122 biopsies from 77 patients). Each histological stage was further grouped into different lesion grades according to the original article (Methods). The statistical significance was assessed by a linear mixed-effects model setting histological stages as fixed effect and accounting for individual patients as a random effect. No correction was made for multiple comparisons among developmental stages. Metaplasia versus normal stage, P = 0.0083; SCC versus metaplasia, P = 0.098. d, ORACLE risk scores compared between primary regions seeding and nonseeding metastatic clones determined by the phylogenies in the TRACERx exploratory cohort (n = 17 tumors including 22 seeding regions and 31 nonseeding regions). The statistical significance was tested with a linear mixed-effects model using primary tumor regions as a fixed effect and accounting for individual patients as a random effect, P = 0.03. e, A Kaplan–Meier curve showing the DFS of ORACLE in the TRACERx validation cohort (n = 158 patients, with 54 of them having relapse). The percentages of patients developing relapse in each ORACLE risk class are annotated. Statistical significance was tested with a two-sided log-rank test. f, A Kaplan–Meier curve showing the DFS of ORACLE in stage I subgroup (n = 70 patients with stage I LUAD). The statistical significance was tested by a two-sided log-rank test. For c and d, the center line of the boxplot indicates the median and the box spans from the 25th to 75th percentile. The lower and upper whiskers define the 5th and 95th percentiles, respectively. Source data
Fig. 4
Fig. 4. ORACLE delineates chemosensitive cells.
a, A volcano plot showing the correlation between ORACLE risk scores and the sensitivity to anticancer drugs available from the GDSC database (n = 37 LUAD cell lines; 359 compounds; Methods). The analysis was performed using Spearman correlation with the coefficient (ρ) labeled on the x axis and the P value labeled on the y axis. Drugs labeled in red indicate a significant association with ORACLE risk scores. FDA-approved drugs for NSCLC are annotated and circled with black color. b, A dot plot showing the distribution of Spearman coefficients for drugs categorized according to their targeting pathways. The targeting pathways for each drug (359 compounds) were obtained from the GDSC database. Drugs showing significant association with ORACLE risk scores are labeled in red. The center line of the boxplot indicates the median, and the box spans from the 25th to 75th percentile. The lower and upper whiskers define the 5th and 95th percentiles, respectively. c, Kaplan–Meier curves of ORACLE as a predictive marker for response to adjuvant therapies, dividing patients by the adjuvant treatment status in the TRACERx validation cohort (n = 102 without adjuvant therapy, n = 56 with adjuvant therapy). The statistical significance was tested with a two-sided log-rank test, no adjuvant therapy P = 0.00031 and with adjuvant therapy P = 0.0087. Source data
Fig. 5
Fig. 5. ORACLE as a summary metric of lung cancer evolution.
a, Clinicopathological and genetic correlates with ORACLE magnitude in the TRACERx exploratory cohort (n = 184 patients with stage I–III LUAD). A multiple linear model was applied separately for clinicopathological or genetic features (Methods). #Biopsy, number of biopsies. Each predictor is shown in the column with its model coefficient represented by color scales and labeled with significance (*P < 0.05, **P < 0.01, ***P < 0.005). For categorical variables including female, ex-smoker and smoker, stage II and stage III, the references are male, non-smoker and stage I, respectively. No correction was made for multiple comparisons. b, The OS association of six biomarkers identified in the TRACERx study was examined in the TRACERx exploratory cohort (n = 111 patients with stage I–III LUAD with all biomarker data available). Multivariable Cox analysis was performed on ORACLE, recent subclonal expansion, SCNA-ITH, subclonal WGD, detection of preoperative ctDNA status and STAS, adjusted for known clinicopathological risk factors. P values or baseline (Ref.) are shown for each predictor in the last column. The center box indicating HR and the error bars indicating 95% CIs are shown for each predictor on a natural log scale. c, The percentages of variation of survival outcome explained by the six TRACERx biomarkers were examined by a generalized linear model. Source data
Extended Data Fig. 1
Extended Data Fig. 1. An overview of the TRACERx study.
a, An overview of cohorts utilized in this study. A total of 421 NSCLC patients were enrolled in the TRACERx study (NCT01888601) where we focused on patients with LUAD to perform analyses on LUAD prognostic signatures. Patients involved in the training dataset published previously were removed, yielding the prospective validation cohort (n = 158). Other analyses for discovery were performed on the exploratory cohort including 184 LUAD patients. Patients with multiple regions available were included in certain analyses where specified in the text. b, Batch correction of ORACLE risk score using shared samples (85 regions from 27 patients) between previously published data and current data generated from an updated computational pipeline. A dot plot showing the risk scores between two data versions and risk scores were corrected using the linear regression formula. The P value (P = 1.6×10−97) was tested using a linear regression model and the coefficient of determination (R2) was shown in the graph. Source data
Extended Data Fig. 2
Extended Data Fig. 2. Discordance percentages of published RNA-seq prognostic signatures.
a, Dot plots showing the distribution of risk scores for six published RNA prognostic signatures in the TRACERx validation cohort (n = 122 stage I-III LUAD patients with multiregion RNA-seq data available). Patients were classified into concordant low- (blue), concordant high- (red) and discordant-risk (gray) groups by each signature using median value as a cutoff. b, Bar plots show the percentages of risk groups classified by ORACLE risk class and the six signatures across stage I to stage III. The differences of discordant risk frequencies among tumor stages were examined using chi-squared goodness-of-fit test. Source data
Extended Data Fig. 3
Extended Data Fig. 3. Clustering concordance of published RNA-seq prognostic signatures.
A previously used hierarchical clustering method, applied on the six published prognostic signatures is illustrated. The dendrogram and heatmap shows the clustering of tumor regions. The discordant rate (gray) was calculated as the percentage of patients with tumor regions falling into different clusters. The analysis was iterated from 1 to 122 clusters which is the maximum patient number included in this cohort. The percentage of discordant clustering was illustrated when cutting the dendrogram into 2, 10 and 60 clusters. a, Li et al.’s signature b, Wang et al.’s signature c, Zhao et al.’s signature d, Song et al.’s signature e, Jin et al.’s signature f, Li, Feng et al.’s signature. Source data
Extended Data Fig. 4
Extended Data Fig. 4. Established metrics for quantifying tumor sampling bias.
a, The hierarchical clustering of ORACLE genes using methods described in Extended data Fig. 3. is shown. b, The area under the curve was calculated to represent concordant rate derived from hierarchical clustering method for ORACLE and the six published prognostic signatures. This analysis was run for 1 (100% concordant rate) to 122 clusters (the maximum number of clusters could be obtained for the cohort). Dashed line indicates the number of clusters cut in Extended data Fig. 3. c, A method developed by Househam et al. examining the expression variability. The heatmap shows the gene-wise standard deviation of expression across tumor regions per patient. The average of expression variability is annotated on the left. d, Box plot represents the distribution of mean expression variability across the signature genes for ORACLE and the six other RNA signatures in the TRACERx validation cohort (n = 122 patients with 333 tumor regions). Color for each signature is labeled as the same in panel c. The statistical significance was tested using a two-sided Wilcoxon rank-sum test. The center line of the boxplot indicates median and the box spans from 25th to 75th percentile. The lower and upper whiskers define the 5th and 95th percentiles, respectively. Jin et al., 2022, P = 0.045; Li et al., 2017, P = 0.00012; Wang et al., 2022, P = 0.4; Song et al., 2022, P = 0.56; Li et al., 2022, P = 3.9×10−6; Zhao et al., 2020, P = 4.5×10−12 compared with ORACLE. e, Estimation of minimum biopsy number needed to obtain a stable risk score using an algorithm developed by Bachtiary et al.. Vertical lollipop plot represents the variance of ORACLE risk score within an individual tumor. The average value of variance within tumors divided by a certain number of biopsies (k) was summarized as W. The horizontal dashed line shows the variance between tumors involved in this cohort which is denoted as B. The ratio of W to the total variance (T) measures the stability of risk scores for a given signature. This method was applied to the other six signatures. f, Line plot represents the W/T per signature from one to ten biopsies. The threshold of 0.15 (horizontal dashed line) predefined in the original publication determined the intersection with the best fit line, yielding the least biopsies required to obtain a stable risk score. Source data
Extended Data Fig. 5
Extended Data Fig. 5. Prospective validation of survival association in stage I LUAD and using lung-cancer-specific survival and DFS.
a, Prognostic value of ORACLE in predicting the OS in stage I subgroup (n = 70 patients with stage I LUAD) adjusted for known clinicopathological risk factors. Multivariable Cox analysis was performed incorporating the ORACLE mean risk score, patient sex, patient age, pack years (smoking packs and duration), adjuvant treatment status, tumor stage (TNM 8th edition) and histologic grade. The center box indicating hazard ratio and the error bars indicating 95% confidence intervals are shown for each predictor on a natural log scale. b, The percentages of stage I patients that transit from standard clinical substaging (TNM 8th edition) to ORACLE risk classification. The patients in the TRACERx validation cohort (n = 70 stage I LUAD patients) were stratified by tumor stage into stage IA (n = 38) and stage IB (n = 32) on the left and classified by ORACLE as concordant low- (n = 56), concordant high- (n = 9) and discordant risk (n = 5) groups on the right. The color shows the transition from stage I to ORACLE low- (blue), high- (red) and discordant-risk (gray) groups. c, Prognostic value of ORACLE in a meta-analysis across four independent cohorts of patients with LUAD (n = 580 patients with stage I LUAD). Univariate Cox analysis was performed in four microarray datasets (Shedden et al., Der et al., Okayama et al. and Rousseaux et al.). The center box indicating hazard ratio and the error bars indicating 95% confidence intervals are shown for each predictor on a natural log scale. The diamond indicates the hazard ratio for the meta-analysis of the four microarray cohorts. d, Prognostic value of ORACLE in predicting the lung-cancer-specific death adjusted for known clinicopathological risk factors in the TRACERx validation cohort (n = 158 stage I-III LUAD patients). Multivariable Cox analysis was performed incorporating the ORACLE mean risk score, patient sex, patient age, pack years (smoking packs and duration), adjuvant treatment status and tumor stage (TNM 8th edition). The center box indicating hazard ratio and the error bars indicating 95% confidence intervals are shown for each predictor on a natural log scale. e, Prognostic value of ORACLE in predicting the DFS adjusted for known clinicopathological risk factors in the TRACERx validation cohort (n = 158 stage I-III LUAD patients). Multivariable Cox analysis was performed incorporating the ORACLE mean risk score, patient sex, patient age, pack years (smoking packs and duration), adjuvant treatment status and tumor stage (TNM 8th edition). The center box indicating hazard ratio and the error bars indicating 95% confidence intervals are shown for each predictor on a natural log scale. Source data
Extended Data Fig. 6
Extended Data Fig. 6. Anticancer drug screening in vitro.
a, Flow diagram represents the steps for filtering cell lines and compounds obtained from GDSC and CCLE database, with missing data (n = 54 LUAD cell lines; 396 compounds). Cell lines with more than 50 compound data missing were first removed, yielding 37 cell lines. Compounds with more than 5 cell line data missing were then removed, yielding 359 compounds. b, The association of ORACLE risk score and anticancer drug response determined by half-maximal inhibitory concentration (IC50). Drugs with significant association (see Fig. 4a) are shown in this figure. Spearman correlation coefficients and P values are shown for each compound. Source data
Extended Data Fig. 7
Extended Data Fig. 7. Prediction of adjuvant therapy response.
ORACLE as a predictive marker of response to adjuvant therapies stratified by nodal status in the TRACERx validation cohort (n = 158 patients with stage I-III LUAD). Statistical significance was tested using a two-sided log-rank test. Node negative no adjuvant therapy, P = 0.03; node negative with adjuvant therapy, P = 0.051; node positive no adjuvant therapy, P = 0.35; node positive with adjuvant therapy, P = 0.19. Source data
Extended Data Fig. 8
Extended Data Fig. 8. The association of ORACLE with genetic evolutionary metrics.
Scatter plots and boxplots show the mean of ORACLE risk score summarized per tumor in the TRACERx exploratory cohort (n = 184 patients with stage I-III LUAD) and the correlation with seven clinicopathological and seven genetic features. The center line of the boxplot indicates median and the box spans from 25th to 75th percentile. The lower and upper whiskers define the 5th and 95th percentiles, respectively. Source data
Extended Data Fig. 9
Extended Data Fig. 9. Somatic mutations and copy number alterations underlying clonal expression magnitude.
a, Frequencies of clonal (left) and subclonal (right) driver mutations at gene level compared between high- and low-risk tumor regions in the TRACERx exploratory cohort (n = 142 high-risk and n = 308 low-risk tumor regions from 184 patients with stage I-III LUAD). The scatter plot shows the odds ratio obtained by a two-sided Fisher’s exact test for each gene mutation. A P value of 0.05 was indicated by the horizontal dashed line. b, Oncoprint shows the frequencies of clonal mutations in 10 driver genes that were enriched in ORACLE low-risk and high-risk groups. The column represents the regions across patient tumors in the TRACERx exploratory cohort (n = 184 patients with stage I-III LUAD with 450 region samples). c, The genome-wide SCNAs identified using GISTIC2.0 (Methods). For a given genome region, the G-score difference was calculated between ORACLE low-risk and high-risk cohorts to identify loci with positive selection. The plot shows the false-discovery rate (q value) of the G score in the high-risk cohort. Chromosome segments with significant positive selection (G-score difference >0 and q value < 0.05) are shown in red for amplification and blue for deletion. Vertical dashed lines indicate the threshold of a false-discovery rate (q value) equal to 0.05. The driver SCNAs, as listed in our previous study, located in the chromosome arm harboring detected cytobands are highlighted. Source data
Extended Data Fig. 10
Extended Data Fig. 10. Future applicability of ORACLE in clinical practice.
The possible design of prospective clinical trials to evaluate the performance of ORACLE to guide the adjuvant chemotherapy in high-risk stage I patients and monitor the outcome in low-risk stage II patients. LUAD = lung adenocarcinoma.

References

    1. Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.71, 209–249 (2021). - PubMed
    1. Chen, Z., Fillmore, C. M., Hammerman, P. S., Kim, C. F. & Wong, K.-K. Non-small-cell lung cancers: a heterogeneous set of diseases. Nat. Rev. Cancer14, 535–546 (2014). - PMC - PubMed
    1. Goldstraw, P. et al. The IASLC Lung Cancer Staging Project: proposals for revision of the TNM stage groupings in the forthcoming (eighth) edition of the TNM classification for lung cancer. J. Thorac. Oncol.11, 39–51 (2016). - PubMed
    1. Vargas, A. J. & Harris, C. C. Biomarker development in the precision medicine era: lung cancer as a case study. Nat. Rev. Cancer16, 525–537 (2016). - PMC - PubMed
    1. de Koning, H. J. et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial. N. Engl. J. Med.382, 503–513 (2020). - PubMed

Publication types

Substances

Associated data