Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 16:2025.03.14.643063.
doi: 10.1101/2025.03.14.643063.

Deciphering lung adenocarcinoma evolution and the role of LINE-1 retrotransposition

Tongwu Zhang  1 Wei Zhao  1 Christopher Wirth  2 Marcos Díaz-Gay  3   4   5   6 Jinhu Yin  1 Monia Cecati  7 Francesca Marchegiani  8 Phuc H Hoang  1 Charles Leduc  9 Marina K Baine  10 William D Travis  10 Lynette M Sholl  11 Philippe Joubert  12 Jian Sang  1 John P McElderry  1 Alyssa Klein  1 Azhar Khandekar  1   3   4   5 Caleb Hartman  1 Jennifer Rosenbaum  13 Frank J Colón-Matos  1 Mona Miraftab  1 Monjoy Saha  1 Olivia W Lee  1 Kristine M Jones  1   14 Neil E Caporaso  1   14 Maria Pik Wong  15 Kin Chung Leung  16 Chao Agnes Hsiung  17 Chih-Yi Chen  18   19 Eric S Edell  20 Jacobo Martínez Santamaría  21   22 Matthew B Schabath  23 Sai S Yendamuri  24 Marta Manczuk  25 Jolanta Lissowska  25 Beata Świątkowska  26 Anush Mukeria  27 Oxana Shangina  27 David Zaridze  27 Ivana Holcatova  28   29 Dana Mates  30 Sasa Milosavljevic  31 Milan Savic  32 Yohan Bossé  12 Bonnie E Gould Rothberg  33 David C Christiani  34   35 Valerie Gaborieau  36 Paul Brennan  36 Geoffrey Liu  37 Paul Hofman  38 Robert Homer  39 Soo-Ryum Yang  10 Angela C Pesatori  40   41 Dario Consonni  41 Lixing Yang  42   43   44 Bin Zhu  1 Jianxin Shi  1 Kevin Brown  1 Nathaniel Rothman  1 Stephen J Chanock  1 Ludmil B Alexandrov  3   4   5   45 Jiyeon Choi  1 Maurizio Cardelli  7 Qing Lan  1 Martin A Nowak  46   47 David C Wedge  2   48 Maria Teresa Landi  1
Affiliations

Deciphering lung adenocarcinoma evolution and the role of LINE-1 retrotransposition

Tongwu Zhang et al. bioRxiv. .

Abstract

Understanding lung cancer evolution can identify tools for intercepting its growth. In a landscape analysis of 1024 lung adenocarcinomas (LUAD) with deep whole-genome sequencing integrated with multiomic data, we identified 542 LUAD that displayed diverse clonal architecture. In this group, we observed an interplay between mobile elements, endogenous and exogenous mutational processes, distinct driver genes, and epidemiological features. Our results revealed divergent evolutionary trajectories based on tobacco smoking exposure, ancestry, and sex. LUAD from smokers showed an abundance of tobacco-related C:G>A:T driver mutations in KRAS plus short subclonal diversification. LUAD in never smokers showed early occurrence of copy number alterations and EGFR mutations associated with SBS5 and SBS40a mutational signatures. Tumors harboring EGFR mutations exhibited long latency, particularly in females of European-ancestry (EU_N). In EU_N, EGFR mutations preceded the occurrence of other driver genes, including TP53 and RBM10. Tumors from Asian never smokers showed a short clonal evolution and presented with heterogeneous repetitive patterns for the inferred mutational order. Importantly, we found that the mutational signature ID2 is a marker of a previously unrecognized mechanism for LUAD evolution. Tumors with ID2 showed short latency and high L1 retrotransposon activity linked to L1 promoter demethylation. These tumors exhibited an aggressive phenotype, characterized by increased genomic instability, elevated hypoxia scores, low burden of neoantigens, propensity to develop metastasis, and poor overall survival. Reactivated L1 retrotransposition-induced mutagenesis can contribute to the origin of the mutational signature ID2, including through the regulation of the transcriptional factor ZNF695, a member of the KZFP family. The complex nature of LUAD evolution creates both challenges and opportunities for screening and treatment plans.

PubMed Disclaimer

Conflict of interest statement

LBA is a co-founder, CSO, scientific advisory member, and consultant for io9, has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. LBA is also a compensated member of the scientific advisory board of Inocras. LBA’s spouse is an employee of Biotheranostics. LBA declares U.S. provisional applications filed with UCSD with serial numbers: 63/269,033, 63/366,392; 63/289,601; 63/483,237; 63/412,835; and 63/492,348. LBA is also an inventor of a US Patent 10,776,718 for source identification by non-negative matrix factorization. SRY has received consulting fees from AstraZeneca, Sanofi, Amgen, AbbVie, and Sanofi; received speaking fees from AstraZeneca, Medscape, PRIME Education, and Medical Learning Institute. All other authors declare that they have no competing interests.

Figures

Fig. 1:
Fig. 1:. Evolutionary dynamics of lung cancer.
a) Sankey diagram illustrating high-clonality WGS data summary from Sherlock-Lung. b) Proportion of tumor samples exhibiting whole genome doubling (WGD) across AS_N, EU_N, EU_S and “Others”. c) Distribution of the percentage of mutations with a copy number 2, considering only mutations attributed to clock-like signatures (SBS1 and SBS5). d) Box plots depicting the proportion of total mutations attributed to different clonal statuses. e) Enrichment of early clonal mutations within driver genes in never-smokers and smokers. For each cancer driver gene, a Fisher’s exact test was performed on a 2×2 contingency table (binary variables: smoking status and early clonal mutation status). The significance thresholds for P < 0.05 (green) and FDR < 0.05 (red), calculated using the Benjamini–Hochberg method, are indicated by dashed lines. f) Evolutionary models displaying the recurrent temporal order of driver genes, as inferred by the ASCETIC framework. The color represents the number of samples harboring specific mutation orders. g) Dynamic mutational processes during clonal and subclonal tumor evolution. Fold changes between relative proportions of clonal and subclonal mutations attributed to individual mutational signatures. Each point represents a tumor sample, and points are colored by mutational signature. P-values from the Wilcoxon rank-sum test are displayed at the bottom of the boxplots.
Fig. 2:
Fig. 2:. Features associated with lung tumor latency.
a) Associations between tumor latency and driver gene mutation status. The vertical blue dashed line indicates a 5-year latency difference between driver gene wild type and mutant groups, and the horizontal line represents the significance threshold (FDR<0.05). b) Forest plot illustrating associations between tumor latency and EGFR mutation status, adjusted for sex and tumor purity (percentage of cancer cells within a tumor sample). P-values and regression coefficients with 95% confidence intervals (CIs) are provided for each variable. Significant associations are in red. c) Box plots displaying estimated tumor latency separated by EGFR mutation status and sex. d) Associations between tumor latency and the presence of specific mutational signatures. The vertical blue dashed line represents a 5-year latency difference between tumors with and without each mutational signature, and the horizontal line indicates the significance threshold (FDR<0.05). e) A multivariate regression analysis examining the relationship between tumor latency and various factors, including sex, ancestry, smoking, EGFR mutations, KRAS mutations, and mutational signature ID2. Statistically significant associations (P<0.05) are denoted in red.
Fig. 3:
Fig. 3:. Characterization of tumors with mutational signature ID2.
a) Relationships between mutational signature ID2 presence and gene expression of tumor proliferation markers, analyzed from tumor and normal tissue RNA-Seq data. The horizontal line represents the significance threshold (FDR<0.05). b) Pearson correlation between the number of deletions attributed to mutational signature ID2 and the gene expression of tumor proliferation markers. Pearson correlation coefficients and corresponding p-values are displayed above the plots. c) Kaplan-Meier survival curves for overall survival, stratified by the presence of mutational signature ID2. Significance p-values and Hazard Ratios (HRs) were calculated using two-sided Cox proportional-hazards regression, adjusting for age, sex, smoking, and tumor stage. Numbers in brackets indicate the number of patients. d) Enrichment of tumor metastasis in tumors with mutational signature ID2. Odds ratios and p-values from the Fisher exact test are shown above the plot. e) Enrichment of genomic alterations (WGD=whole genome doubling; SCNA=somatic copy number alterations) in tumors with mutational signature ID2, determined through logistic regression and adjusted for ancestry, sex, smoking status, age, and tumor purity. The horizontal lines represent significance thresholds (FDR<0.05 in orange and FDR<0.01 in red). f) Distribution of estimated hypoxia scores between tumors with ID2 signatures and those without. P-values from the Wilcoxon rank-sum test are displayed above the plot.
Fig. 4:
Fig. 4:. Association between L1 retrotransposition and mutational signatures ID2 and ID1.
a) This panel illustrates a sample (NSLC-0622-T01) as an example of a tumor harboring L1 insertions from a germline source. In the Circos plot, an arrow indicates the direction from the location of L1 elements in the human germline genome to the position of L1 somatic insertions in the tumor genome. The gray line without an arrow in the Circos plot indicates L1 insertions with an unknown source. L1 retrotranspositions originating from the master L1 on chromosome 22q12.1 (highlighted in blue in the Circos plot) are zoomed in. Different partnered 3’ transductions are highlighted by triangles with various colors on chromosome 22, accompanied by a sequence depth plot (gray bars). A specific partnered 3’ transduction (including the L1 repetitive element and the adjacent unique sequence) between chromosome 22 and chromosome 2 serves as an example of one germline-source L1 retrotransposition. b) This section presents another example (tumor sample NSLC-0832-T01) predominantly characterized by somatic-source L1 insertions. In the Circos plot, a green arrow highlights multiple L1 retrotranspositions detected solely in the tumor genome. For improved visualization of these somatic-source L1 insertions, a zoomed-in view specifically focusing on the somatic retrotransposition between chromosome 3 and chromosome 11 is provided in the second Circos plot. Additionally, a specific partnered 3’ transduction serves to elucidate somatic-source L1 retrotransposition. c) Distribution of retrotransposable sources of L1 insertions. The bottom pie chart displays the percentage of L1 insertions retrotransposed from germline, somatic, and unknown L1 elements. The top pie chart show the proportion of L1 insertions originating from specific germline L1 masters. d) Enrichment of mutational signatures in tumors with germline source L1 insertions. The horizontal lines indicate the significance threshold FDR < 0.05 in orange and FDR < 0.01 in red. e) Pearson correlation between deletions attributed to signature ID2 and insertions attributed to signature ID1. Pearson correlation coefficients and corresponding p-values are displayed in the plot. f-g) Mutational signature profiles and motifs for mutational signature ID1 and ID2, respectively. h) Pearson correlation between deletions attributed to ID2 and total somatic L1 insertions. Pearson correlation coefficients and corresponding p-values are shown in the plot.
Fig. 5:
Fig. 5:. Activation of germline L1 retrotransposition due to DNA demethylation of L1 promoter.
a) Diagram illustrating the transposon mobilization mechanism for long interspersed element 1 (L1). Adapted from a publication by Levin and Moran, this mechanism depicts non-LTR retrotransposons mobilizing through target-site-primed reverse transcription (TPRT). ORF2-encoded endonuclease generates a single-strand ‘nick’ in genomic DNA, freeing a 3′-OH used to prime RNA reverse transcription. Demethylated CpG in the L1 promoter region (top purple arrow) is hypothesized to activate L1 retrotransposition. Endonuclease activity, coupled with DNA repair mechanisms, might lead to one-base pair deletions or insertions at polymer A/T regions. b) Validation of DNA methylation levels in the promoter region of germline L1 insertions from germline source in chr22q12.1, conducted via targeted bisulfite sequencing. The polar plot represents the median methylation level across the genome locations of the CpG island on chr22q12.1, stratified by normal lung samples (N=80), tumor samples without ID2 signature (N=40), and tumors samples with ID2 signature (N=40). Control samples were designed to represent 0%, 33.3%, 66.6%, and 100% methylation levels at each CpG site as shown on the y-axis. c) Box plot shows DNA median methylation levels across the genome locations of the CpG island on chr22q12.1, stratified by the sample type and ID2 status. d) Total L1 RNA expression estimated from RNA-Seq data, stratified by sample type and group. e) Total L1 RNA expression differs between tumors with and without the ID2 signature.
Fig. 6:
Fig. 6:. ZNF695 upregulation in tumors and its association with mutational signature ID2.
a) Analysis of differentially expressed KZFP protein coding genes between tumors with ID2 signatures and those without. Horizontal dashed lines represent significance thresholds (FDR < 0.05 in orange and FDR < 0.01 in red). The top 20 significant genes are annotated with gene names. b) Box plots illustrate the differential expression of ZNF695 among normal tissue or blood, tumors without ID2 signatures, and tumors with ID2 signatures. c) Pearson correlations between KZFP protein-coding gene expression and indels attributed to mutational signature ID2. Horizontal dashed lines indicate significance thresholds (FDR < 0.05 in orange and FDR < 0.01 in red). The top 20 significant genes are annotated with gene names. d) Correlation between ZNF695 expression and deletions attributed to mutational signature ID2. Pearson correlation coefficients and corresponding p-values are displayed above the plot. e) Correlation between ZNF695 RNA-Seq expression and the median of DNA methylation levels across the genome locations of the CpG island on chr22q12.1. Pearson correlation coefficients and corresponding p-values are displayed. f). Differentially expressed ZNF695 target genes identified between tumors with and without mutational signature ID2.

References

    1. Cancer facts & figures 2023. https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts....
    1. Yang P. et al. Adenocarcinoma of the Lung Is Strongly Associated with Cigarette Smoking: Further Evidence from a Prospective Study of Women. Am. J. Epidemiol. 156, 1114–1122 (2002). - PubMed
    1. Zappa C. & Mousa S. A. Non-small cell lung cancer: current treatment and future advances. Transl. Lung Cancer Res. 5, 288–300 (2016). - PMC - PubMed
    1. Devarakonda S. et al. Genomic Profiling of Lung Adenocarcinoma in Never-Smokers. J. Clin. Orthod. 39, 3747–3758 (2021). - PMC - PubMed
    1. Shi J. et al. Genome-wide association study of lung adenocarcinoma in East Asia and comparison with a European population. Nat. Commun. 14, 1–17 (2023). - PMC - PubMed

Publication types