. 2023 Apr;616(7957):543-552.

doi: 10.1038/s41586-023-05706-4. Epub 2023 Apr 12.

Genomic-transcriptomic evolution in lung cancer and metastasis

Carlos Martínez-Ruiz^#^{1

2}, James R M Black^#^{1

2}, Clare Puttick^#^{1

2

3}, Mark S Hill^#³, Jonas Demeulemeester^{4

5

6}, Elizabeth Larose Cadieux^{4

7}, Kerstin Thol^{1

2}, Thomas P Jones^{1

2}, Selvaraju Veeriah¹, Cristina Naceur-Lombardelli¹, Antonia Toncheva¹, Paulina Prymas¹, Andrew Rowan³, Sophia Ward^{1

3

8}, Laura Cubitt⁸, Foteini Athanasopoulou^{1

3

8}, Oriol Pich³, Takahiro Karasaki^{1

3

9}, David A Moore^{1

3

10}, Roberto Salgado^{11

12}, Emma Colliver³, Carla Castignani^{4

7}, Michelle Dietzen^{1

2

3}, Ariana Huebner^{1

2

3}, Maise Al Bakir^{1

3}, Miljana Tanić^{7

13}, Thomas B K Watkins³, Emilia L Lim^{1

3}, Ali M Al-Rashed¹⁴, Danny Lang¹⁵, James Clements¹⁵, Daniel E Cook³, Rachel Rosenthal³, Gareth A Wilson³, Alexander M Frankell^{1

3}, Sophie de Carné Trécesson¹⁶, Philip East¹⁷, Nnennaya Kanu¹, Kevin Litchfield^{1

18}, Nicolai J Birkbak^{1

3

19

20

21}, Allan Hackshaw²², Stephan Beck⁷, Peter Van Loo^{4

23

24}, Mariam Jamal-Hanjani^{1

9

25}; TRACERx Consortium; Charles Swanton^{26

27

28}, Nicholas McGranahan^{29

30}

Collaborators, Affiliations

Collaborators

TRACERx Consortium:
Nicholas McGranahan, Charles Swanton, Maise Al Bakir, Emilia L Lim, Alexander M Frankell, Kevin Litchfield, Nicolai J Birkbak, Peter Van Loo, Jason F Lester, Amrita Bajaj, Apostolos Nakas, Azmina Sodha-Ramdeen, Keng Ang, Mohamad Tufail, Mohammed Fiyaz Chowdhry, Molly Scotland, Rebecca Boyles, Sridhar Rathinam, Claire Wilson, Domenic Marrone, Sean Dulloo, Dean A Fennell, Gurdeep Matharu, Jacqui A Shaw, Joan Riley, Lindsay Primrose, Ekaterini Boleti, Heather Cheyne, Mohammed Khalil, Shirley Richardson, Tracey Cruickshank, Gillian Price, Keith M Kerr, Sarah Benafif, Kayleigh Gilbert, Babu Naidu, Akshay J Patel, Aya Osman, Christer Lacson, Gerald Langman, Helen Shackleford, Madava Djearaman, Salma Kadiri, Gary Middleton, Angela Leek, Jack Davies Hodgkinson, Nicola Totten, Angeles Montero, Elaine Smith, Eustace Fontaine, Felice Granato, Helen Doran, Juliette Novasio, Kendadai Rammohan, Leena Joseph, Paul Bishop, Rajesh Shah, Stuart Moss, Vijay Joshi, Philip Crosbie, Fabio Gomes, Kate Brown, Mathew Carter, Anshuman Chaturvedi, Lynsey Priest, Pedro Oliveira, Colin R Lindsay, Fiona H Blackhall, Matthew G Krebs, Yvonne Summers, Alexandra Clipson, Jonathan Tugwood, Alastair Kerr, Dominic G Rothwell, Elaine Kilgour, Caroline Dive, Hugo J W L Aerts, Roland F Schwarz, Tom L Kaufmann, Zoltan Szallasi, Judit Kisistok, Mateo Sokac, Miklos Diossy, Abigail Bunkum, Aengus Stewart, Alastair Magness, Angeliki Karamani, Benny Chain, Brittany B Campbell, Chris Bailey, Christopher Abbosh, Clare E Weeden, Claudia Lee, Corentin Richard, Crispin T Hiley, David R Pearce, Despoina Karagianni, Dhruva Biswas, Dina Levi, Elena Hoxha, Emma Nye, Eva Grönroos, Felip Gálvez-Cancino, Francisco Gimeno-Valiente, George Kassiotis, Georgia Stavrou, Gerasimos Mastrokalos, Haoran Zhai, Helen L Lowe, Ignacio Garcia Matos, Jacki Goldman, James L Reading, Javier Herrero, Jayant K Rane, Jerome Nicod, Jie Min Lam, John A Hartley, Karl S Peggs, Katey S S Enfield, Kayalvizhi Selvaraju, Kevin W Ng, Kezhong Chen, Krijn Dijkstra, Kristiana Grigoriadis, Krupa Thakkar, Leah Ensell, Mansi Shah, Marcos Vasquez Duran, Maria Litovchenko, Mariana Werner Sunderland, Michelle Leung, Mickael Escudero, Mihaela Angelova, Monica Sivakumar, Olga Chervova, Olivia Lucas, Othman Al-Sawaf, Philip Hobson, Piotr Pawlik, Richard Kevin Stone, Robert Bentham, Robert E Hynds, Roberto Vendramin, Sadegh Saghafinia, Saioa López, Samuel Gamble, Seng Kuong Anakin Ung, Sergio A Quezada, Sharon Vanloo, Simone Zaccaria, Sonya Hessey, Stefan Boeing, Supreet Kaur Bola, Tamara Denner, Teresa Marafioti, Thanos P Mourikis, Victoria Spanswick, Vittorio Barbè, Wei-Ting Lu, William Hill, Wing Kin Liu, Yin Wu, Yutaka Naito, Zoe Ramsden, Catarina Veiga, Gary Royle, Charles-Antoine Collins-Fekete, Francesco Fraioli, Paul Ashford, Tristan Clark, Martin D Forster, Siow Ming Lee, Elaine Borg, Mary Falzon, Dionysis Papadatos-Pastos, James Wilson, Tanya Ahmad, Alexander James Procter, Asia Ahmed, Magali N Taylor, Arjun Nair, David Lawrence, Davide Patrini, Neal Navani, Ricky M Thakrar, Sam M Janes, Emilie Martinoni Hoogenboom, Fleur Monk, James W Holding, Junaid Choudhary, Kunal Bhakhri, Marco Scarci, Martin Hayward, Nikolaos Panagiotopoulos, Pat Gorman, Reena Khiroya, Robert C M Stephens, Yien Ning Sophia Wong, Steve Bandula, Abigail Sharp, Sean Smith, Nicole Gower, Harjot Kaur Dhanda, Kitty Chan, Camilla Pilotti, Rachel Leslie, Anca Grapa, Hanyun Zhang, Khalid AbdulJabbar, Xiaoxi Pan, Yinyin Yuan, David Chuter, Mairead MacKenzie, Serena Chee, Aiman Alzetani, Judith Cave, Lydia Scarlett, Jennifer Richards, Papawadee Ingram, Silvia Austin, Eric Lim, Paulo De Sousa, Simon Jordan, Alexandra Rice, Hilgardt Raubenheimer, Harshil Bhayani, Lyn Ambrose, Anand Devaraj, Hema Chavan, Sofina Begum, Silviu I Buderi, Daniel Kaniu, Mpho Malima, Sarah Booth, Andrew G Nicholson, Nadia Fernandes, Pratibha Shah, Chiara Proli, Madeleine Hewish, Sarah Danson, Michael J Shackcloth, Lily Robinson, Peter Russell, Kevin G Blyth, Craig Dick, John Le Quesne, Alan Kirk, Mo Asif, Rocco Bilancia, Nikos Kostoulas, Mathew Thomas

Affiliations

¹ Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK.
² Cancer Genome Evolution Research Group, Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK.
³ Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute and University College London Cancer Institute, London, UK.
⁴ Cancer Genomics Laboratory, The Francis Crick Institute, London, UK.
⁵ Integrative Cancer Genomics Laboratory, Department of Oncology, KU Leuven, Leuven, Belgium.
⁶ VIB-KU Leuven Center for Cancer Biology, Leuven, Belgium.
⁷ Medical Genomics, University College London Cancer Institute, London, UK.
⁸ Advanced Sequencing Facility, The Francis Crick Institute, London, UK.
⁹ Cancer Metastasis Laboratory, University College London Cancer Institute, London, UK.
¹⁰ Department of Cellular Pathology, University College London Hospitals, London, UK.
¹¹ Department of Pathology, ZAS Hospitals, Antwerp, Belgium.
¹² Division of Research, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.
¹³ Experimental Oncology, Institute for Oncology and Radiology of Serbia, Belgrade, Serbia.
¹⁴ Centre for Nephrology, Division of Medicine, University College London, London, UK.
¹⁵ Scientific Computing STP, Francis Crick Institute, London, UK.
¹⁶ Oncogene Biology Laboratory, The Francis Crick Institute, London, UK.
¹⁷ Bioinformatics and Biostatistics, The Francis Crick Institute, London, UK.
¹⁸ Tumour Immunogenomics and Immunosurveillance Laboratory, University College London Cancer Institute, London, UK.
¹⁹ Department of Molecular Medicine, Aarhus University Hospital, Aarhus, Denmark.
²⁰ Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.
²¹ Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark.
²² Cancer Research UK & UCL Cancer Trials Centre, London, UK.
²³ Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
²⁴ Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
²⁵ Department of Medical Oncology, University College London Hospitals, London, UK.
²⁶ Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK. Charles.Swanton@crick.ac.uk.
²⁷ Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute and University College London Cancer Institute, London, UK. Charles.Swanton@crick.ac.uk.
²⁸ Department of Medical Oncology, University College London Hospitals, London, UK. Charles.Swanton@crick.ac.uk.
²⁹ Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK. nicholas.mcgranahan.10@ucl.ac.uk.
³⁰ Cancer Genome Evolution Research Group, Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK. nicholas.mcgranahan.10@ucl.ac.uk.

^# Contributed equally.

PMID: 37046093
PMCID: PMC10115639
DOI: 10.1038/s41586-023-05706-4

Genomic-transcriptomic evolution in lung cancer and metastasis

Carlos Martínez-Ruiz et al. Nature. 2023 Apr.

. 2023 Apr;616(7957):543-552.

doi: 10.1038/s41586-023-05706-4. Epub 2023 Apr 12.

Authors

Collaborators

TRACERx Consortium:
Nicholas McGranahan, Charles Swanton, Maise Al Bakir, Emilia L Lim, Alexander M Frankell, Kevin Litchfield, Nicolai J Birkbak, Peter Van Loo, Jason F Lester, Amrita Bajaj, Apostolos Nakas, Azmina Sodha-Ramdeen, Keng Ang, Mohamad Tufail, Mohammed Fiyaz Chowdhry, Molly Scotland, Rebecca Boyles, Sridhar Rathinam, Claire Wilson, Domenic Marrone, Sean Dulloo, Dean A Fennell, Gurdeep Matharu, Jacqui A Shaw, Joan Riley, Lindsay Primrose, Ekaterini Boleti, Heather Cheyne, Mohammed Khalil, Shirley Richardson, Tracey Cruickshank, Gillian Price, Keith M Kerr, Sarah Benafif, Kayleigh Gilbert, Babu Naidu, Akshay J Patel, Aya Osman, Christer Lacson, Gerald Langman, Helen Shackleford, Madava Djearaman, Salma Kadiri, Gary Middleton, Angela Leek, Jack Davies Hodgkinson, Nicola Totten, Angeles Montero, Elaine Smith, Eustace Fontaine, Felice Granato, Helen Doran, Juliette Novasio, Kendadai Rammohan, Leena Joseph, Paul Bishop, Rajesh Shah, Stuart Moss, Vijay Joshi, Philip Crosbie, Fabio Gomes, Kate Brown, Mathew Carter, Anshuman Chaturvedi, Lynsey Priest, Pedro Oliveira, Colin R Lindsay, Fiona H Blackhall, Matthew G Krebs, Yvonne Summers, Alexandra Clipson, Jonathan Tugwood, Alastair Kerr, Dominic G Rothwell, Elaine Kilgour, Caroline Dive, Hugo J W L Aerts, Roland F Schwarz, Tom L Kaufmann, Zoltan Szallasi, Judit Kisistok, Mateo Sokac, Miklos Diossy, Abigail Bunkum, Aengus Stewart, Alastair Magness, Angeliki Karamani, Benny Chain, Brittany B Campbell, Chris Bailey, Christopher Abbosh, Clare E Weeden, Claudia Lee, Corentin Richard, Crispin T Hiley, David R Pearce, Despoina Karagianni, Dhruva Biswas, Dina Levi, Elena Hoxha, Emma Nye, Eva Grönroos, Felip Gálvez-Cancino, Francisco Gimeno-Valiente, George Kassiotis, Georgia Stavrou, Gerasimos Mastrokalos, Haoran Zhai, Helen L Lowe, Ignacio Garcia Matos, Jacki Goldman, James L Reading, Javier Herrero, Jayant K Rane, Jerome Nicod, Jie Min Lam, John A Hartley, Karl S Peggs, Katey S S Enfield, Kayalvizhi Selvaraju, Kevin W Ng, Kezhong Chen, Krijn Dijkstra, Kristiana Grigoriadis, Krupa Thakkar, Leah Ensell, Mansi Shah, Marcos Vasquez Duran, Maria Litovchenko, Mariana Werner Sunderland, Michelle Leung, Mickael Escudero, Mihaela Angelova, Monica Sivakumar, Olga Chervova, Olivia Lucas, Othman Al-Sawaf, Philip Hobson, Piotr Pawlik, Richard Kevin Stone, Robert Bentham, Robert E Hynds, Roberto Vendramin, Sadegh Saghafinia, Saioa López, Samuel Gamble, Seng Kuong Anakin Ung, Sergio A Quezada, Sharon Vanloo, Simone Zaccaria, Sonya Hessey, Stefan Boeing, Supreet Kaur Bola, Tamara Denner, Teresa Marafioti, Thanos P Mourikis, Victoria Spanswick, Vittorio Barbè, Wei-Ting Lu, William Hill, Wing Kin Liu, Yin Wu, Yutaka Naito, Zoe Ramsden, Catarina Veiga, Gary Royle, Charles-Antoine Collins-Fekete, Francesco Fraioli, Paul Ashford, Tristan Clark, Martin D Forster, Siow Ming Lee, Elaine Borg, Mary Falzon, Dionysis Papadatos-Pastos, James Wilson, Tanya Ahmad, Alexander James Procter, Asia Ahmed, Magali N Taylor, Arjun Nair, David Lawrence, Davide Patrini, Neal Navani, Ricky M Thakrar, Sam M Janes, Emilie Martinoni Hoogenboom, Fleur Monk, James W Holding, Junaid Choudhary, Kunal Bhakhri, Marco Scarci, Martin Hayward, Nikolaos Panagiotopoulos, Pat Gorman, Reena Khiroya, Robert C M Stephens, Yien Ning Sophia Wong, Steve Bandula, Abigail Sharp, Sean Smith, Nicole Gower, Harjot Kaur Dhanda, Kitty Chan, Camilla Pilotti, Rachel Leslie, Anca Grapa, Hanyun Zhang, Khalid AbdulJabbar, Xiaoxi Pan, Yinyin Yuan, David Chuter, Mairead MacKenzie, Serena Chee, Aiman Alzetani, Judith Cave, Lydia Scarlett, Jennifer Richards, Papawadee Ingram, Silvia Austin, Eric Lim, Paulo De Sousa, Simon Jordan, Alexandra Rice, Hilgardt Raubenheimer, Harshil Bhayani, Lyn Ambrose, Anand Devaraj, Hema Chavan, Sofina Begum, Silviu I Buderi, Daniel Kaniu, Mpho Malima, Sarah Booth, Andrew G Nicholson, Nadia Fernandes, Pratibha Shah, Chiara Proli, Madeleine Hewish, Sarah Danson, Michael J Shackcloth, Lily Robinson, Peter Russell, Kevin G Blyth, Craig Dick, John Le Quesne, Alan Kirk, Mo Asif, Rocco Bilancia, Nikos Kostoulas, Mathew Thomas

Affiliations

¹ Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK.
² Cancer Genome Evolution Research Group, Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK.
³ Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute and University College London Cancer Institute, London, UK.
⁴ Cancer Genomics Laboratory, The Francis Crick Institute, London, UK.
⁵ Integrative Cancer Genomics Laboratory, Department of Oncology, KU Leuven, Leuven, Belgium.
⁶ VIB-KU Leuven Center for Cancer Biology, Leuven, Belgium.
⁷ Medical Genomics, University College London Cancer Institute, London, UK.
⁸ Advanced Sequencing Facility, The Francis Crick Institute, London, UK.
⁹ Cancer Metastasis Laboratory, University College London Cancer Institute, London, UK.
¹⁰ Department of Cellular Pathology, University College London Hospitals, London, UK.
¹¹ Department of Pathology, ZAS Hospitals, Antwerp, Belgium.
¹² Division of Research, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.
¹³ Experimental Oncology, Institute for Oncology and Radiology of Serbia, Belgrade, Serbia.
¹⁴ Centre for Nephrology, Division of Medicine, University College London, London, UK.
¹⁵ Scientific Computing STP, Francis Crick Institute, London, UK.
¹⁶ Oncogene Biology Laboratory, The Francis Crick Institute, London, UK.
¹⁷ Bioinformatics and Biostatistics, The Francis Crick Institute, London, UK.
¹⁸ Tumour Immunogenomics and Immunosurveillance Laboratory, University College London Cancer Institute, London, UK.
¹⁹ Department of Molecular Medicine, Aarhus University Hospital, Aarhus, Denmark.
²⁰ Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.
²¹ Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark.
²² Cancer Research UK & UCL Cancer Trials Centre, London, UK.
²³ Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
²⁴ Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
²⁵ Department of Medical Oncology, University College London Hospitals, London, UK.
²⁶ Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK. Charles.Swanton@crick.ac.uk.
²⁷ Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute and University College London Cancer Institute, London, UK. Charles.Swanton@crick.ac.uk.
²⁸ Department of Medical Oncology, University College London Hospitals, London, UK. Charles.Swanton@crick.ac.uk.
²⁹ Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK. nicholas.mcgranahan.10@ucl.ac.uk.
³⁰ Cancer Genome Evolution Research Group, Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK. nicholas.mcgranahan.10@ucl.ac.uk.

^# Contributed equally.

PMID: 37046093
PMCID: PMC10115639
DOI: 10.1038/s41586-023-05706-4

Abstract

Intratumour heterogeneity (ITH) fuels lung cancer evolution, which leads to immune evasion and resistance to therapy¹. Here, using paired whole-exome and RNA sequencing data, we investigate intratumour transcriptomic diversity in 354 non-small cell lung cancer tumours from 347 out of the first 421 patients prospectively recruited into the TRACERx study^2,3. Analyses of 947 tumour regions, representing both primary and metastatic disease, alongside 96 tumour-adjacent normal tissue samples implicate the transcriptome as a major source of phenotypic variation. Gene expression levels and ITH relate to patterns of positive and negative selection during tumour evolution. We observe frequent copy number-independent allele-specific expression that is linked to epigenomic dysfunction. Allele-specific expression can also result in genomic-transcriptomic parallel evolution, which converges on cancer gene disruption. We extract signatures of RNA single-base substitutions and link their aetiology to the activity of the RNA-editing enzymes ADAR and APOBEC3A, thereby revealing otherwise undetected ongoing APOBEC activity in tumours. Characterizing the transcriptomes of primary-metastatic tumour pairs, we combine multiple machine-learning approaches that leverage genomic and transcriptomic variables to link metastasis-seeding potential to the evolutionary context of mutations and increased proliferation within primary tumour regions. These results highlight the interplay between the genome and transcriptome in influencing ITH, lung cancer evolution and metastasis.

Trial registration: ClinicalTrials.gov NCT01888601.

PubMed Disclaimer

Conflict of interest statement

S.V. is a co-inventor to a patent to detect molecules in a sample (US patent no. 10578620). M.A.B. has consulted for Achilles Therapeutics. A.M.F. is co-inventor to a patent application to determine methods and systems for tumour monitoring (PCT/EP2022/077987). M.J-H. has consulted for, and is a member of, the Achilles Therapeutics Scientific Advisory Board and Steering Committee; has received speaker honoraria from Pfizer, Astex Pharmaceuticals and Oslo Cancer Cluster; and holds patent PCT/US2017/028013 relating to methods for lung cancer detection. This patent has been licensed to commercial entities and under terms of employment. M.J.-H. is due a share of any revenue generated from such license(s). A. Hackshaw has received fees for being a member of independent data monitoring committees for Roche-sponsored clinical trials and academic projects co-ordinated by Roche. C.S. acknowledges grant support from AstraZeneca, Boehringer-Ingelheim, Bristol Myers Squibb, Pfizer, Roche-Ventana, Invitae (previously Archer Dx Inc — collaboration in minimal residual disease sequencing technologies) and Ono Pharmaceutical. C.S. is an AstraZeneca Advisory Board member and chief investigator for the AZ MeRmaiD 1 and 2 clinical trials, and is also co-chief investigator of the NHS Galleri trial, funded by GRAIL, and a paid member of GRAIL’s Scientific Advisory Board. He receives consultant fees from Achilles Therapeutics (where he is also a Scientific Advisory Board member), Bicycle Therapeutics (where he is also a Scientific Advisory Board member), Genentech, Medicxi, Roche Innovation Centre – Shanghai, Metabomed (until July 2022) and the Sarah Cannon Research Institute, had stock options in Apogen Biotechnologies and GRAIL until June 2021, currently has stock options in Epic Bioscience and Bicycle Therapeutics, and has stock options in and is co-founder of Achilles Therapeutics. C.S. is an inventor on a European patent application relating to assay technology to detect tumour recurrence (PCT/GB2017/053289); the patent has been licensed to commercial entities and under his terms of employment C.S is due a revenue share of any revenue generated from such license(s). C.S. holds patents relating to targeting neoantigens (PCT/EP2016/059401), identifying clinical response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA loss of heterozygosity (PCT/GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221), identifying patients whose cancer responds to treatment (PCT/GB2018/051912), detecting tumour mutations (PCT/US2017/28013), methods for lung cancer detection (US20190106751A1) and identifying insertion/deletion mutation targets (European and US, PCT/GB2018/051892), and is co-inventor to a patent application to determine methods and systems for tumour monitoring (PCT/EP2022/077987). C.S. is a named inventor on a provisional patent protection related to a ctDNA detection algorithm. G.A.W. is employed by and has stock options in Achilles Therapeutics. N.M. has received consultancy fees and has stock options in Achilles Therapeutics. N.M. holds European patents relating to targeting neoantigens (PCT/EP2016/ 059401), identifying clinical response to immune checkpoint blockade (PCT/ EP2016/071471), determining HLA loss of heterozygosity (PCT/GB2018/052004) and predicting survival rates of patients with cancer (PCT/GB2020/050221). D.A.M. reports speaker fees from AstraZeneca, Eli Lilly and Takeda, consultancy fees from AstraZeneca, Thermo Fisher, Takeda, Amgen, Janssen and Eli Lilly and has received educational support from Takeda and Amgen. S.C.T. has acted as a consultant for Revolution Medicines. N.J.B. is a co-inventor to a patent to identify patients whose cancer responds to treatment (PCT/GB2018/051912), a co-inventor on a patent for methods for predicting anti-cancer response (US14/466,208) and has a patent application (PCT/GB2020/050221) on methods for cancer prognostication. R.S. reports non-financial support from Merck and Bristol Myers Squibb (BMS); research support from Merck, Puma Biotechnology and Roche; and personal fees from Roche, BMS and Exact Sciences for advisory boards. E.L.C. is employed by and has stocks in Achilles Therapeutics.

Figures

**Fig. 1. Expression diversity in the TRACERx 421 cohort.**
a, Relationship between PCs of transcriptomic diversity and genomic (black labels) and clinical (blue labels) variables. Displayed are the top PCs within LUADs (n = 480 regions from 190 tumours) and LUSCs (n = 303 regions from 119 tumours) that together explain at least 30% of the total variance, alongside their median ratio of heterogeneity (intratumour heterogeneity of PC activity divided by intertumour heterogeneity of PC activity). The colour of the border around each square indicates the direction of the association between each covariate and PC. In total, 39 variables were tested (Methods). Significance was determined using a mixed-effects linear model with purity as a fixed covariate and tumour as a random variable. Only features significant (P < 0.05) after FDR correction with at least one PC are displayed. *PC1 in LUAD was strongly negatively associated with the expression of hallmark gene sets related to proliferation (Extended Data Fig. 1f, Methods). GD, genome doubling; TMB, tumour mutational burden; wGII, weighted genome instability index. b, I-TED, calculated as the mean normalized gene expression correlation distance for a given region paired with every other region from the same tumour, displayed by histology. c, Proportion of variance in I-TED explained by selected genomic and clinical features from a linear model using 260 tumours with at least 2 primary tumour regions, and purity and genome instability estimates. Histological types represented by only a single tumour were excluded to ensure a sufficiently large sample size to estimate the effect of histology. **P = 0.003, ***P = 5.15 × 10⁻¹⁰. d, ASCAT-derived tumour purity and RNA estimate of the tumour transcripts fraction. Each dot represents one tumour region. A modified version of ASCAT was used to estimate the proportion of tumour and non-tumour cells within an admixed sequencing sample. e, dN/dS, inferring positive and negative selection of truncating somatic mutations, for cancer genes and non-cancer genes, by tertiles of median gene expression across the cohort (left) and by tertiles of gene expression ITH across the cohort (right). Dots represent the estimated dN/dS and the error bars represent the 95% confidence intervals calculated using the genesetdnds function in R from the package dNdScv. A dN/dS estimate is considered significant if the 95% confidence intervals do not overlap 1. Expression level tertiles contained 76, 24 and 9 cancer genes, and 4,856, 5,100 and 5,166 non-cancer genes, for tertiles 3, 2 and 1, respectively. Expression ITH tertiles contained 54, 24 and 31 cancer genes and 4,994, 5,082 and 5,046 non-cancer genes, for tertiles 3, 2 and 1, respectively. Median expression levels and expression ITH were based on the total number of tumour samples collected at surgical resection from tumours with more than one sample at that time point (n = 845 regions from 283 tumours).

**Fig. 2. ASE in NSCLC.**
a, Schematic displaying the concepts of biallelic expression, CN-dependent ASE (CN-dep ASE) and CN-independent ASE (CN-ind ASE). b, Proportion of evaluable (containing an expressed SNP) genes affected by CN-dependent ASE and CN-independent ASE in tumours and normal tissue samples. LUAD, n = 454 regions from 144 tumours; LUSC, n = 293 regions from 88 tumours; Other, other subtypes, n = 130 regions from 38 tumours; Normal, tumour-adjacent normal lung tissue, n = 95. c, Points indicate odds ratio estimates for CN-independent ASE when somatic point mutations, or ASM (in samples for which both RRBS and RNA-seq were available) was concomitantly detected in the same gene, by type of alteration. Bars indicate 95% confidence intervals. Odds ratios for the links between CN-independent ASE and mutations and between CN-independent ASE and ASM are based on 876 primary tumour regions from 332 tumours and on 96 tumour regions from 31 tumours, respectively. d, Relationship in LUAD between the proportion of evaluable genes with CN-independent ASE and the ratio of differentially hypomethylated clusters of neighbouring CpGs compared to all differentially methylated genomic positions. The P value was calculated using a linear mixed-effects model with tumour as the random variable. e, Linear mixed-effects model showing the impact of driver mutations in candidate epigenetic modifier genes (mutated in more than five tumours) and tumour mutational burden on the proportion of evaluable genes with CN-independent ASE. Factors independently associated with increased CN-independent ASE in a multivariable model are coloured blue. *P < 0.05, **P < 0.01, ***P < 0.001. f, An example of genomic–transcriptomic mirrored subclonal allelic imbalance occurring in *FAT1* within CRUK0640. DNA and RNA B allele frequencies (BAFs) for each SNP in *FAT1* are plotted and coloured according to the reference and variant status of each allele for each region sampled within the tumour. In this instance, there is evidence of CN-dependent ASE in two regions and CN-independent ASE in one region. These events favour overexpression of different parental alleles and occur on different branches of the phylogenetic tree; a simplified version is displayed. MRCA, most recent common ancestor.

**Fig. 3. RNA-SBS signatures in NSCLC.**
a, RNA-editing overview (from top to bottom): number and type of RNA substitutions per Mb per primary tumour, tumours are sorted from left to right by histological subtype and by number of substitutions; proportion of each editing type per tumour; NSCLC histological subtype per tumour. b, Number of RNA substitutions detected per tumour by histological subtype of NSCLC and in normal adjacent lung tissue. LUAD, n = 190; LUSC, n = 119; Other, other subtypes, n = 43; Normal, tumour-adjacent normal lung tissue, n = 96. Boxes represent the lower quartile, median and upper quartile. c, Left, trinucleotide profile of each RNA-SBS signature (left). Only samples from patients with more than 20 RNA variants were considered, n = 333. Right, signature ITH measured as standard deviation of each signature exposure across tumour regions divided by the mean exposure of each signature across the cohort, based on 280 tumours with more than 20 RNA variants and more than one region. The percentage of tumours with signature activity in at least one primary region is indicated in parentheses. d, Volcano plot showing the Pearson’s r correlations between the number of RNA-SBS1 (top) or RNA-SBS2 (bottom) substitutions with the expression of all genes in the transcriptome. P values were calculated using a linear mixed-effects model, using the tumour of origin of each region as random effect. P values were adjusted for repeated measures. Correlations were based on 765 primary tumour regions with at least 20 RNA variants from 329 tumours. Colour indicates dot density, with light coloured points belonging to areas of high density in the plot. e, Correlation between the exposure of RNA-SBS signatures within tumour-adjacent normal lung tissue and their respective primary tumour regions, and metastatic tumour regions and their respective seeding regions in the primary tumour. Primary tumour exposure was calculated as the median exposure across all primary regions for the comparison with tumour-adjacent normal tissue, and across all seeding regions for the comparison with metastases. Only primary–metastasis pairs where more than 20 RNA substitutions were detected in the metastasis and primary region were used (n = 50 pairs for normal samples, n = 31 for metastases). P values were computed with a two-sided t test testing the null hypothesis that the Pearson correlation coefficient r = 0.

**Fig. 4. Transcriptional landscape of seeding tumour regions.**
a, Expression distance between primary regions compared to either metastatic LN regions or pulmonary nodules resected at the time of surgery (left) or metastatic regions resected at relapse within the same patient (right). Only tumours containing two or more regions with at least one metastatic region sampled are shown (n = 50 primary–metastasis pairs from 35 tumours). b, First two PCs for all available primary and metastatic tumour regions in an example tumour, CRUK0361, based on gene expression levels. The region containing the seeding clone was more proximal to the metastatic sample than other primary regions. c, Expression distance between metastatic samples and their paired primary samples across the cohort depending on whether the region contained a seeding clone(s). The analysis was run on 22 metastatic samples that had gene expression data for both seeding and non-seeding primary regions. d, ROC curves for ensemble models trained on each feature set: genomic only (red), transcriptomic only (blue), combined genomic and transcriptomic (green) and assessed against the held-out test dataset. The predictions are based on 516 primary tumour regions from 206 tumours for which seeding status could be established and for which all metrics tested could be measured (307 non-seeding regions, 209 seeding), with a 75/25% training/test dataset split. e, Mean Shapley additive explanations (SHAP) values (calculated across the held-out test dataset) for each feature in the combined ensemble model, capturing the importance of each feature for model prediction. Label colours indicate the feature type, genomic (red) or transcriptomic (blue), and box colours indicate the model type from which the SHAP values were extracted. The symbols at the end of the bars indicate either a significantly positive (+) or negative (–) association, with increased weight for seeding potential based on a two-sided Wilcoxon test comparing seeding to non-seeding regions. MLP-SVM, multilayer perceptron with support vector machine. All box plots in this figure represent the lower quartile, median and upper quartile, whiskers represent lower and higher bound ±1.5× interquartile range. All Wilcoxon tests shown here (paired or unpaired) were two sided.

**Extended Data Fig. 1. Patterns of expression diversity in the TRACERx cohort.**
a. Uniform manifold approximation and projection (UMAP) showing the distribution of each primary tumour region in the cohort based on gene expression. n = 914 tumour regions collected at surgical resection from 352 primary tumours, n = 33 recurrence/relapse samples from 24 tumours and n = 96 paired normal samples from 96 tumours. LUAD: Lung adenocarcinoma; LUSC: Lung squamous cell carcinoma; LCNEC: Large cell neuroendocrine carcinoma. b. Percentage of tumours with and without ‘LUAD drivers’ (driver mutations enriched in LUADs) in LUAD, non-LUADs clustering with LUADs in the UMAP and non-LUADs clustering apart from LUADs. Number of tumours within each category is annotated. c. Mean number of variables significantly associated with each principal component (PC) of gene expression after randomly sub-sampling the number of LUAD regions to match that of LUSC regions (n = 303) for 50 iterations. LUAD subtypes were not included in this comparison to ensure an equal number of variables between LUAD and LUSC. d. PC associations with each of the different RAS activation groups (RAG) developed by East and colleagues. PC activity different significantly between RAGs. Analysis based on 480 tumour regions collected at surgical resection from 190 LUAD tumours where RAG could be estimated. e. Proportion of LUAD tumours in smokers (comprising current and ex-smokers) and never smokers, split by LUAD subtype, with either G12C KRAS driver mutations, non-G12C KRAS driver mutations or driver mutations in other genes. Numbers annotated indicate the number of tumours per category. f. Pearson’s r between each PC and functional groups comprising the fifty MSigDb Hallmark gene sets. Pearson’s r values were averaged within the functional group to which each hallmark was assigned across LUAD, n = 480 tumour regions from 190 tumours; and LUSC, n = 303 tumour regions from 119 tumours. The colour of the border around each square indicates the direction of the association between each covariate and PC for significant (FDR<0.05) associations. Significance was determined through a mixed effects linear model using purity as a fixed covariate and tumour as a random variable; P values were calculated by hallmark and combined within MSigDB functional group using the harmonic mean. g. Immuno-histochemical staining for Ki67 proliferation marker in LUAD tumours with and without *EGFR* driver mutations. Only the 196 LUAD tumours within which Ki67 was measured are displayed. Significance was calculated through a two-sided unpaired Wilcoxon test. WT: Wild type. h. Percentage of variance in Intra-Tumour Expression Distance (I-TED) that was explained by intra-tumour variance in tumour transcript fraction and intra-tumour variance in tumour purity, in a linear regression. Analysis based on 258 tumours with at least two primary tumour regions, and purity and tumour transcript fraction estimates. ***:P value = 5.03 × 10⁻⁸; **:P value = 0.007. i. dN/dS in non-cancer and cancer genes for different quantiles of ITH or expression amplitude. Asterisks indicate significance whereby the 95% confidence interval of the dN/dS estimate did not overlap 1 signalling either negative (blue square) or positive (red square) selection. Broadly, lower quantiles of ITH tended towards negative selection in non-cancer genes, whereas the opposite was true for cancer genes. Results based on bootstrapping from the total number of tumour samples resected at surgery of the primary tumour from tumours with more than one sample at that time point, 845 regions from 285 tumours. j. Percentage of all essential genes from the Project Achilles list (n = 604) in lung cancer for tertiles of expression ITH or amplitude. All box plots in this figure represent lower quartile, median and upper quartile, whiskers represent lower/higher bound +/− 1.5 x interquartile range.

**Extended Data Fig. 2. Genomic and transcriptomic links with allele-specific expression.**
a. Points indicate odds ratio estimates for copy-number dependent allele-specific expression (CN-dependent ASE) when somatic point mutations, or allele-specific methylation (where both RRBS and RNA-Seq were available) were concomitantly detected in the same gene, by type of alteration. Bars indicate 95% confidence intervals. Odds ratio for the links between CN-dependent ASE and mutations; and CN-dependent ASE and ASM are based on 876 primary tumour regions from 332 tumours, and 96 tumour regions from 31 tumours, respectively. b. Relationship between the proportion of CN-independent ASE in a tumour that is subclonal, being found in a subset of regions within a given tumour, and intra-tumour expression diversity. The Pearson correlation coefficient is shown (r = 0.25, P = 4 × 10⁻⁵). c. Percentage of variation in I-TED that was explained by single nucleotide variant (SNV), SCNA and CN-independent ASE ITH, as well as the number of subclonal whole genome duplication events (GDs) per tumour. The linear regression was based on 269 tumours where all variables could be calculated. ***:P = 2.4 × 10⁻¹⁰; **:P = 0.004. d. PCA of CN-independent ASE patterns in TRACERx421 tumours (n = 877 tumour regions) and normal tissue (n = 95) samples where CN-independent ASE could be estimated. Samples are coloured by tissue type. Values within parentheses on the axes indicate the proportion of variance explained by each principal component. e. Genes with CN-independent ASE in either tumour or normal tissue samples. Genes with an enrichment of CN-independent ASE in tumours are marked in blue, lung cancer genes are represented by triangles and imprinted genes have a black outline. Enrichment was defined as FDR < 0.05 from a Fisher’s exact test per gene. The number of regions used to calculate enrichment varied per gene between 5 and 850 (median = 164) for tumours and between 5 and 95 (median = 35) for normal tissue. f. Relationship in LUSC between the proportion of evaluable genes with CN-independent ASE and the ratio of differentially hypo-methylated clusters of neighbouring CpGs compared to all differentially methylated genomic positions. The Pearson correlation coefficient is shown; P value was calculated using a linear mixed-effects model with tumour as random variable (r = −0.18, P = 0.35). g. Percentage of evaluable genes affected by CN-independent ASE in wild type (WT) and *SETD2* deficient isogenic cell lines. Expression data was obtained from publicly available datasets from three separate studies in three different cell lines^–: in total, data from 10 cell lines across 3 experiments (n = 6, 2 and 2). Boxes represent lower quartile, median and upper quartile. P values were calculated using a linear mixed effects model, using the study of origin of each sample as a random effect. *SETD2*-/-: inactivation of the *SETD2* gene.

**Extended Data Fig. 3. Patterns of RNA variant diversity in TRACERx.**
a. Overview of RNA substitutions in the primary tumour lung TRACERx cohort, from top to bottom: Number and type of RNA variants per megabase per tumour, tumours are sorted from left to right by histological subtype and by number of variants; Proportion of each variant type per tumour; Proportion of variants present in any of the normal samples; Proportion of tumour-specific RNA variant sites shared across at least two tumours. NSCLC histological subtype per patient. LUAD, lung adenocarcinomas, n = 190; LUSC, lung squamous cell carcinomas, n = 119; Other, other subtypes, n = 43; tumour-adjacent normal lung tissue, n = 96. b. Volcano plots showing Pearson correlations between the number of RNA variant signature substitutions and gene expression for all genes in the transcriptome, split by RNA single-base substitution (SBS) signature. P values were calculated using a linear mixed effects model, using tumour of origin of each region as random effect. The genes with the 5 most significant correlations with each signature are labelled. P values were adjusted for repeated measures. Correlations were based on 765 primary tumour regions with at least 20 RNA variants from 329 tumours. Colour indicates dot density, with light coloured points belonging to areas of high density in the plot. c. Proportion of RNA variants relative to variant type (A>G or C>T) in 4nt RNA loops. C>T substitutions were more prevalent in the 4th nucleotide of 4nt RNA hairpin loops, consistent with APOBEC RNA editing activity. d. Proportion of substitutions assigned to RNA-SBS2 activity compared to the proportion of RNA variants at CAT[C>T] motif sites per tumour region (CAUC ratio). Blue dots represent regions where RNA editing at these motifs was enriched (Fisher’s test P<0.05 for C>T substitutions at each site compared to C sites in a 40nt genomic region). P values were computed based on a two-sided t test testing the null hypothesis that the Pearson correlation coefficient (r) = 0, within 892 tumour regions and 77 tumour-adjacent normal tissue samples with at least 10 C>T variants. e. Pearson correlation between the exposure of RNA-SBS signatures within metastatic tumour regions and their respective seeding regions in the primary tumour (left); and tumour-adjacent normal lung tissue and their respective primary tumour regions (right). Primary tumour exposure was calculated as the median exposure across all primary regions for the comparison with normal tumour-adjacent tissue, and of all seeding regions for the comparison with metastases. Only primary-metastasis pairs where more than 20 RNA substitutions were detected in the metastasis and primary region were used (n = 50 pairs for normals, n = 31 for metastases). P values were computed based on a two-sided t test testing the null hypothesis that the Pearson correlation coefficient = 0. f. Pearson correlation between the activity of RNA-SBS1 and the global levels of methylation in a tumour region (measured as the percentage of all differentially methylated positions that are differentially hypomethylated clusters of neighbouring CpGs). Methylation data and sufficient RNA substitutions for signature deconvolution were available for 80 regions from 31 tumours. P values were calculated using a linear mixed effects model, using tumour of origin of each region as a random effect.

**Extended Data Fig. 4. Transcriptional features of metastasis.**
a. Expression distance between paired primary tumour regions; compared to distance between paired primary and non-LN intrathoracic metastatic tumour regions. Only patients with two or more primary regions and at least one metastatic region sampled are shown (12 primary-metastasis pairs from 8 tumours). Boxes represent lower quartile, median and upper quartile, whiskers represent lower/higher bound +/− 1.5 x interquartile range. Significance was tested using a paired Wilcoxon test (P = 0.00098). b. Gene set enrichment analysis (GSEA) of functional groups from hallmark gene sets between metastasis seeding and non-seeding regions. Only tumours where both seeding and non-seeding regions had RNA-seq were included (n = 37, 122 regions). Dots coloured by a significant enrichment after FDR correction. Mean normalised enrichment score (NES) is displayed on the x-axis and indicates the enrichment for a given gene set, and the negative log of the adjusted P value is displayed on the y-axis. c. Overview schematic of the machine learning framework used to predict whether a region contains a metastasis-seeding clone(s). MLP-SVM: multilayer-perceptron with support vector machine terminal layer. d. Individual Shapley Additive Explanations (SHAP) values for the most important features across the combined ensemble. Positive SHAP values indicate weighting towards a prediction of metastasis seeding whereas negative SHAP values indicate a weighting towards prediction of metastasis non-seeding. Colour scale represents the value of the feature across the test dataset (red=high values, blue=low values). For instance, high values of the ORACLE expression marker (red dots) were associated with a higher likelihood of a region being seeding (positive SHAP values) in the combined ensemble. The predictions were based on 516 primary tumour regions from 206 tumours where seeding status could be established and where all metrics tested could be measured (307 non-seeding regions, 209 seeding), with a 75%-25% training-test dataset split. TMB: tumour mutational burden; CN-ind ASE: Copy number-independent allele specific expression; HPCS: High Plasticity Cell State; GD: genome doubling; CCF: cancer cell fraction; Clone dominance CCF: maximum CCF at terminal nodes of a phylogenetic tree; SCNA: somatic copy number alteration.

See this image and copyright information in PMC

Comment in

Molecular portraits of lung cancer evolution.
Hayes TK, Meyerson M. Hayes TK, et al. Nature. 2023 Apr;616(7957):435-436. doi: 10.1038/d41586-023-00934-0. Nature. 2023. PMID: 37045956 No abstract available.

References

1. Black, J. R. M. & McGranahan, N. Genetic and non-genetic clonal diversity in cancer evolution. Nat. Rev. Cancer21, 379–392 (2021). - DOI - PubMed
1. Bailey, C. et al. Tracking cancer evolution through the disease course. Cancer Discov.11, 916–932 (2021). - DOI - PMC - PubMed
1. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med.376, 2109–2121 (2017). - DOI - PubMed
1. PCAWG Transcriptome Core Group et al. Genomic basis for RNA alterations in cancer. Nature578, 129–136 (2020). - DOI - PMC - PubMed
1. Marjanovic, N. D. et al. Emergence of a high-plasticity cell state during lung cancer evolution. Cancer Cell38, 229–246.e13 (2020). - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genomic-transcriptomic evolution in lung cancer and metastasis

Collaborators

Affiliations

Genomic-transcriptomic evolution in lung cancer and metastasis

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical