Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar;25(3):517-525.
doi: 10.1038/s41591-018-0323-0. Epub 2019 Jan 21.

Deciphering the genomic, epigenomic, and transcriptomic landscapes of pre-invasive lung cancer lesions

Affiliations

Deciphering the genomic, epigenomic, and transcriptomic landscapes of pre-invasive lung cancer lesions

Vitor H Teixeira et al. Nat Med. 2019 Mar.

Abstract

The molecular alterations that occur in cells before cancer is manifest are largely uncharted. Lung carcinoma in situ (CIS) lesions are the pre-invasive precursor to squamous cell carcinoma. Although microscopically identical, their future is in equipoise, with half progressing to invasive cancer and half regressing or remaining static. The cellular basis of this clinical observation is unknown. Here, we profile the genomic, transcriptomic, and epigenomic landscape of CIS in a unique patient cohort with longitudinally monitored pre-invasive disease. Predictive modeling identifies which lesions will progress with remarkable accuracy. We identify progression-specific methylation changes on a background of widespread heterogeneity, alongside a strong chromosomal instability signature. We observed mutations and copy number changes characteristic of cancer and chart their emergence, offering a window into early carcinogenesis. We anticipate that this new understanding of cancer precursor biology will improve early detection, reduce overtreatment, and foster preventative therapies targeting early clonal events in lung cancer.

PubMed Disclaimer

Conflict of interest statement

Competing Interests Statement

The authors declare the following competing interests:

A.S. is an employee of Johnson and Johnson. Discoveries within this manuscript have led S.M.J. to lead on Patent Applications 1819453.0 and 1819452.2 filed with the UK Intellectual Property Office through UCL Business PLC.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. Experimental workflow.
Flow diagram illustrating which profiling techniques were applied to which samples. Biopsies taken fromindex CIS lesions were stored as fresh frozen (FF) and formalin-fixed paraffin embedded (FFPE). DNA was extracted from FF biopsies. The first 54 samples studied that had sufficient extracted DNA passing quality control (QC) underwent first methylation profiling, then whole-genome sequencing (WGS) when sufficient remaining DNA was available. Due to the low DNA quantity extracted from some biopsies, the methylation dataset (n=54) was larger than the WGS dataset (n=29), therefore the subsequent 10 samples underwent WGS directly without methylation profiling. RNA was extracted from FFPE samples and underwent gene expression profiling when RNA passed QC. To ensure validity of our conclusions across orthogonal platforms we usedIllumina microarrays to profile a discovery set of 33 samples, then subsequently used Affymetrix micro arrays to profile an independent validation set of 18 further samples.
Extended Data Fig. 2
Extended Data Fig. 2. Mutational signatures of CiSlesions.
a–d, The contribution of each of five pre-selected mutational signatures to each lesion is shown. These five mutational signatures, associated with CpG deamination (1), APOBEC (2 and 13), tobacco (4) and unknown aetiology (5), were selected based on an initial run using all 30 mutational signatures, which showed that these were present in the data and in signature extractions from lung squamous cell cancer (LUSC) datasets. The number of substitutions attributed to each signature is shown (a-b) as well as the proportion of mutations attributed to each mutational signature (c-d). Samples from the same patient share the same identifier except for the final letter; for example, PD21883a and PD21883d are two samples from the same patient. e, Comparison of the mutational signatures of CIS lesions to those found in lung squamous cell cancer (LUSC). LUSC data were downloaded from TCGA and mutations called with our algorithms. All mutations from all samples from each cancer type were pooled for this analysis. The colour scale indicates the proportion of substitutions in each sample that are attributed to each signature. f-j, Comparison of the relative proportion of mutations attributed to each signature between progressive (red; n = 29) and regressive (green; n = 10) CIS samples. P values were calculated using likelihood ratio tests of a mixed effects model with outcome (progressive or regressive) included as a fixed effect versus a model that was identical but for the fact that outcome was not included as a fixed effect. Only signature 4 (smoking-associated) was significantly different between the two groups. Boxplots are generated using the R boxplot function, which displays the first and third quartile as hinges and places whiskers at the most extreme data point that is no more than 1.5 times the length of the box away from the box.
Extended Data Fig. 3
Extended Data Fig. 3. Genome-wide copy number changes of CIS lesions.
Visualization of copy number changes for 39 whole-genome-sequenced CIS samples. Rows represent samples, genomic position is represented on the x-axis. Local copy number gains are illustrated in red, losses in blue. We observe widespread changes in progressive CIS samples and a subset of regressive samples.
Extended Data Fig. 4
Extended Data Fig. 4. Documentation of biopsy history and chronology of lesion appearance in three misclassified regressive cases.
a, Case 1 (PD21893a) appeared to regress from a CIS lesion (07/2012) to squamous metaplasia (SqM; 11/2012). However, again, CIS was subsequently reconfirmed by biopsy (05/2013). b, Case 2 (PD21884a) had a lobectomy for T1N0 lung squamous cell cancer (LUSC) in the left upper lobe (LUL) and was under surveillance for carcinoma-in-situ (CIS) at the resection margins. A subsequent, high-grade CIS lesion (08/2009) profiled for genome-wide DNA methylation changes was considered regressive since a follow-up biopsy on the same anatomical site demonstrated the presence of a low-grade, moderately dysplastic (MoD) lesion (11/2009). A subsequent biopsy, however, was classified as CIS (02/2011) and the lesion then remained static for 26 months but eventually progressed into invasive cancer (04/2014). c, Case 3 (PD38326a) had an initial diagnosis of CIS (11/2015) followed by regression to normal epithelium (03/2016). CIS was subsequently identified at the same site (03/2017), with invasive cancer diagnosed on subsequent biopsy (07/2017).
Extended Data Fig. 5
Extended Data Fig. 5. Genomic aberrations in pre-invasive lung CIS lesions.
Comparisons of the number of substitutions (a), small insertions and deletions (b), genome rearrangements (c) and copy number changes (d), showing significantly more genomic changes in progressive (n = 29) than regressive (n = 10) lesions. Although there were more clonal substitutions in progressive than regressive lesions (e), the proportion of substitutions that were clonal and the number of clones were similar (f-g). Progressive lesions had more putative driver mutations (h). Telomere lengths (base pairs) were similar between the two groups (i). To confirm an association between CIN gene expression and copy number change we correlated Weighted Genome Integrity Index (wGII) with mean CIN gene expression for the CIS samples in which we have both gene expression and whole-genome sequencing data (n = 11). Pearson correlation coefficient r2 = 0.473 (j). All P values were calculated using likelihood ratio tests of a mixed effects model with outcome (progressive or regressive) included as a fixed effect versus a model that was identical but for the fact that outcome was not included as a fixed effect. Boxplots are generated using the R boxplot function, which displays the first and third quartile as hinges and places whiskers at the most extreme data point that is no more than 1.5 times the length of the box away from the box.
Extended Data Fig. 6
Extended Data Fig. 6. Subclonal mutational structure in progressive and regressive CIS lesions.
Heatmap showing the proportion of overlapping mutations between samples taken from the same patient. For four patients with lesions that would ultimately progress to cancer (denoted ‘P’), over half the mutations were shared between any two given samples, suggesting that the lesions were derived from a common ancestral clone. By contrast, for two patients with lesions that would ultimately regress (denoted ‘R’), almost no mutations were shared, suggesting that the lesions arose independently. Samples from the same patient are shown in the same color; PD38321a and PD38322a do belong to the same patient and were mislabelled during processing.
Extended Data Fig. 7
Extended Data Fig. 7. Differential molecular changes between progressive and regressive lesions.
Visualization of differential changes across the genome. A, shows all identified differentially methylated regions (DMRs) (hypermethylated regions in yellow, hypomethylated in blue) alongside a similar analysis comparing cancer and control samples from The Cancer Genome Atlas. We observe that 58% of DMRs identified in our progressive vs regressive analysis are also identified in cancer vs control. B, shows copy number changes across the genome in regressive CIS, progressive CIS and TCGA cancer samples. We observe congruency of copy number change, suggesting similar processes in the two cohorts.
Extended Data Fig. 8
Extended Data Fig. 8. Principal component analysis investigating effect of various biological, clinical and technical factors affecting correct case segregation for all DMPs and gene expression data.
a-f, Principal component analysis based on all methylation probes (n = 87; 36 progressive, 18 regressive, 33 control). (a) Smoking history (pack years). (b) Chronic obstructive pulmonary disease (COPD) status. (c) Previous lung cancer history referring to the presence of lung squamous cell cancer (LUSC) prior to identification of pre-invasive lesions. (d) Age at bronchoscopy (years); age of individual when pre-invasive lesion was first biopsied. (e) Gender. (f) Sentix ID. g-k, Principal component analysis for all gene expression data. (g) Smoking history (pack years). (h) COPD status. (i) Previous lung cancer history referring to the presence of LUSC prior to identification of pre-invasive lesions. (j) Age at bronchoscopy (years); age of individual when pre-invasive lesion was first biopsied. (k) Gender. P-values were calculated using multivariate ANOVA.
Extended Data Fig. 9
Extended Data Fig. 9. Predictive modeling and ROC analytics of gene expression and CNA data.
ROC and precision-recall curves for the predictive model based on gene expression data shown in Fig. 4A-C. Curves are shown for the CIS discovery set (a-b), CIS validation set (c-d) and application to TCGA LUSC data (e-f). Using an analogous method to gene expression and methylation we used copy number data derived from methylation arrays to predict lesion outcome. Probe-level copy number changes were aggregated over cytogenetic bands; these data were used as input to Prediction Analysis of Microarrays (PAM). g-i, Probability plot based on a 154 cytogenetic band signature for correct class prediction (red circles indicate progressive lesions, green circles indicate regressive lesions). The area under the curve for the 154-cytogenetic band signature is 0.86. j-l, Application of our predictive model to previously published data (van Boerdonk et al.) replicates their result, classifying all regressive and 9/12 progressive samples correctly. This dataset included pre-invasive samples of various histological grades, rather than only CIS. m-o, Application of our predictive model to TCGA copy number data. Samples were correctly classified into TCGA LUSC and TCGA control samples with an AUC of 0.98.
Extended Data Fig. 10
Extended Data Fig. 10. Predictive modeling of methylation data.
In addition to the predictive modeling based on probe variation shown in Fig. 5, we used differentially expressed methylation probes to create a predictor using a Prediction Analysis for Microarrays (PAM) method. The model was trained on a training set (a-c) consisting of 26 progressive samples, 11 regressive samples and 23 control samples, shown in red, green and blue, respectively. A predictor based on 141 DMPs was created. This was applied to a validation set of 10 progressive, 7 regressive and 10 control samples (d-f), predicting outcome with AUC = 0.99. g-i, Application of our predictive model to TCGA methylation data. Samples were correctly classified into TCGA LUSC and TCGA control samples with AUC = 0.99. j-m, ROC analytics and precision-recall curves for Methylation Heterogeneity Index (MHI) model presented in Fig. 4. Curves apply to cancer vs control (j-k) and progressive vs regressive (l-m), respectively. n, Histogram of AUC values using MHI model with random samples of 2000 probes, applied to progressive vs regressive data. This demonstrates that a similar AUC is achieved with a random sample of probes as when using the entire array.
Figure 1
Figure 1. Analysis of pre-invasive lung carcinoma-in-situ (CIS) lesions.
(a) Detection of bronchial pre-invasive CIS lesions by autofluorescence bronchoscopy. (b) Histological outcomes of bronchial pre-invasive lesions. (c) Overview of the study protocol. Patients with identified CIS lesions underwent repeat bronchoscopy and rebiopsy every 4 months. Definitive cancer treatment was only performed if pathological evidence of progression to invasive cancer was detected. The ‘index biopsy’ profiled in this study refers to the biopsy immediately preceding progression to invasive cancer or regression to low-grade dysplasia or normal epithelium. (d) Venn diagram of different -omics analyses performed on laser capture microdissection (LCM)-captured CIS lesions. Due to the small size of bronchial biopsies, not all analyses were performed on all samples
Figure 2
Figure 2. Genomic aberrations in pre-invasive lung carcinoma-in-situ (CIS) lesions.
Circos diagram comparing CIS genomic profiles with TCGA LUSC data. The outer histogram (A), shows mutation frequencies of all genes in TCGA data. The inner histogram (D) shows mutation frequencies in our CIS data. Profiles appear similar and no statistically significant differences were identified between the two datasets. Genes previously identified as potential drivers of lung cancer are labelled. Between the two histograms, average copy number changes are shown for TCGA data (B) and CIS data (C). Copy number gains are shown in red, losses in blue. Although differences between whole-genome and whole-exome sequencing techniques makes these datasets difficult to compare, we observe many similar features between the two; for example, gains in 3q and 5p, which are well recognised features of squamous cell lung cancer. In the centre of the circos plot, 39 rings represent the copy number profiles of our 39 samples, illustrating the individual contribution of each sample to the average values presented (E).
Figure 3
Figure 3. Altered methylation and gene expression in lung carcinoma-in-situ (CIS) lesions.
(a) Hierarchical clustering of 1335 significantly differentially expressed genes in progressive (n=17) and regressive (n=16) CIS lesions, based on a discovery set. Biological and clinical factors including age at diagnosis, gender, smoking history (pack years) and COPD status had no effect on CIS lesion gene expression profile (high expression = purple, low expression = orange). (b) Hierarchical clustering of the top 1000 significantly differentially methylated positions (DMPs) between progressive (n=36) and regressive (n=18) CIS lesions and controls (n=33). Biological and clinical factors including age at diagnosis, gender and smoking history (pack years) status had no effect on the methylation profile (hypomethylated DMPs = blue, hypermethylated DMPs = orange). (c) Principle component analysis of all profiled genes in progressive (n=27) and regressive (n=24) CIS lesions showing a clear distinction between progressive and regressive groups (p=0.0017). (d) Principle component analysis of all methylation data in progressive (n=36), regressive (n=18) and control (n=33) CIS lesions showing a clear distinction between progressive and regressive groups (p=6.8x10-25). P values were calculated using multivariate ANOVA.
Figure 4
Figure 4. Carcinoma-in-situ (CIS) gene expression and methylation profiles are predictive of progression to cancer.
(a) Probability plot based on a 291-gene signature for correct class prediction (discovery set - red circles indicate progressive lesions, green circles indicate regressive lesions). (b) Challenging the 291-gene signature on a CIS validation set. Area under the curve (AUC) is 1 using Receiver Operating Characteristic (ROC) analysis. (c) Application of the 291-gene signature to TCGA LUSC data. Our signature classified TCGA LUSC vs TCGA controls samples with AUC of 0.81 (green circles indicate TCGA controls, orange circles indicate TCGA LUSC). (d) Distribution of methylation beta values across the genome in TCGA controls, CIS regressive and progressive and TCGA LUSC samples. Most probes are regulated at 0 or 1 in normal tissue but this regulation is reduced in both regressive and progressive CIS and TCGA LUSC samples. (e) Methylation Heterogeneity Index, defined as counts of methylation probes with 0.26 < ß < 0.88, for each sample. MHI is higher in regressive and progressive CIS and TCGA LUSC compared with TCGA controls and this can be used as an accurate predictor with AUC=0.96 for TCGA LUSC vs TCGA controls and AUC=0.74 for progressive vs regressive CIS. (f) Histogram of AUC values calculated by performing the same analysis used in (e) 10,000 times, with each run limited to a different random sample of 2,000 probes (AUC mean for TCGA LUSC vs TCGA controls is 0.95 (95% CI 0.92-0.98)). This demonstrates that a random sample of methylation probes can be an accurate predictor using this method.
Figure 5
Figure 5. Chromosomal instability is associated with progression to cancer.
(a) Mean expression of CIN-associated genes in CIS samples. Progressive (n=27) and regressive (n=24) CIS samples are well differentiated with AUC=0.96. Green circles indicate regressive CIS lesions; red circles indicate progressive CIS. (b) Plot of NEK2 expression across CIS samples demonstrates increasing expression with progression to cancer. Expression of this gene alone classifies progressive vs regressive CIS with AUC=0.93. (c) Pathway analysis of gene expression data between progressive (n=17) and regressive (n=16) CIS shows a strong chromosomal instability (CIN) signal, based on a discovery set. This signal remains strong when cell cycle genes are removed from the CIN70 signature. (d) Pathway analysis of methylation data demonstrating several cancer-related pathways up-regulated in progressive CIS compared with regressive CIS. Quoted significance values in (c) and (d) are calculated using 2-sided t-tests adjusted for multiple testing using a False Discovery Rate method, as implemented in the GAGE Bioconductor package.

Comment in

References

    1. Parkin DM, Bray F, Ferlay J, Pisani P. Global cancer statistics, 2002. CA Cancer J Clin. 2005;55:74–108. - PubMed
    1. Torre LA, Siegel RL, Jemal A. Lung Cancer Statistics. Advances in experimental medicine and biology. 2016;893:1–19. doi: 10.1007/978-3-319-24223-1_1. - DOI - PubMed
    1. Nicholson AG, et al. Reproducibility of the WHO/IASLC grading system for pre-invasive squamous lesions of the bronchus: a study of inter-observer and intraobserver variation. Histopathology. 2001;38:202–208. - PubMed
    1. van der Heijden EH, Hoefsloot W, van Hees HW, Schuurbiers OC. High definition bronchoscopy: a randomized exploratory study of diagnostic value compared to standard white light bronchoscopy and autofluorescence bronchoscopy. Respir Res. 2015;16:33. doi: 10.1186/s12931-015-0193-7. - DOI - PMC - PubMed
    1. Thakrar RM, Pennycuick A, Borg E, Janes SM. Preinvasive disease of the airway. Cancer Treat Rev. 2017;58:77–90. doi: 10.1016/j.ctrv.2017.05.009. - DOI - PubMed

Publication types

MeSH terms