Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2020 Aug;158(2):808-819.
doi: 10.1016/j.chest.2020.01.048. Epub 2020 Feb 28.

Independent Validation of Early-Stage Non-Small Cell Lung Cancer Prognostic Scores Incorporating Epigenetic and Transcriptional Biomarkers With Gene-Gene Interactions and Main Effects

Affiliations
Multicenter Study

Independent Validation of Early-Stage Non-Small Cell Lung Cancer Prognostic Scores Incorporating Epigenetic and Transcriptional Biomarkers With Gene-Gene Interactions and Main Effects

Ruyang Zhang et al. Chest. 2020 Aug.

Abstract

Background: DNA methylation and gene expression are promising biomarkers of various cancers, including non-small cell lung cancer (NSCLC). Besides the main effects of biomarkers, the progression of complex diseases is also influenced by gene-gene (G×G) interactions.

Research question: Would screening the functional capacity of biomarkers on the basis of main effects or interactions, using multiomics data, improve the accuracy of cancer prognosis?

Study design and methods: Biomarker screening and model validation were used to construct and validate a prognostic prediction model. NSCLC prognosis-associated biomarkers were identified on the basis of either their main effects or interactions with two types of omics data. A prognostic score incorporating epigenetic and transcriptional biomarkers, as well as clinical information, was independently validated.

Results: Twenty-six pairs of biomarkers with G×G interactions and two biomarkers with main effects were significantly associated with NSCLC survival. Compared with a model using clinical information only, the accuracy of the epigenetic and transcriptional biomarker-based prognostic model, measured by area under the receiver operating characteristic curve (AUC), increased by 35.38% (95% CI, 27.09%-42.17%; P = 5.10 × 10-17) and 34.85% (95% CI, 26.33%-41.87%; P = 2.52 × 10-18) for 3- and 5-year survival, respectively, which exhibited a superior predictive ability for NSCLC survival (AUC3 year, 0.88 [95% CI, 0.83-0.93]; and AUC5 year, 0.89 [95% CI, 0.83-0.93]) in an independent Cancer Genome Atlas population. G×G interactions contributed a 65.2% and 91.3% increase in prediction accuracy for 3- and 5-year survival, respectively.

Interpretation: The integration of epigenetic and transcriptional biomarkers with main effects and G×G interactions significantly improves the accuracy of prognostic prediction of early-stage NSCLC survival.

Keywords: early stage; interaction; multiomics; non-small cell lung cancer; prognostic score.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart of study design and statistical analyses. In the epigenetic analysis, patients with lung adenocarcinoma and lung squamous cell carcinoma from the Harvard, Spain, Norway, and Sweden cohorts were used in the discovery phase for screening, whereas data from the Cancer Genome Atlas (TCGA) was used for validation. In transcriptional analysis, gene expression data from Gene Expression Omnibus and TCGA were used in the discovery phase and the validation phase, respectively. Both main effect and G×G interaction analyses were performed. G×G = gene by gene; NSCLC = non-small cell lung cancer.
Figure 2
Figure 2
Estimated survival curves for patients grouped by various biomarker-based scores. A, Epigenetic score of DNA methylation. B, Transcriptional score of gene expression. C, Integrative score of DNA methylation and gene expression. D, Prognostic score of DNA methylation, gene expression, and clinical information. Patients were categorized into low-, medium-, and high-score groups by using the tertiles of each score as the cutoffs. E, Discriminative ability of the prognostic score. Results of 3- and 5-year survival rate, median survival time, and hazard ratio (HR) were compared across five groups, defined by using the quintiles of the prognostic score as the cutoffs. F, HR and P values were derived from the Cox proportional hazards model for patients with different quintile levels of the prognostic score. HRH vs L = HRHigh vs Low; HRM vs L = HRMedium vs Low.
Figure 3
Figure 3
Forest plots of results from stratification analysis of prognostic score. HR with 95% CI of the prognostic score on non-small cell lung cancer survival in various subgroups is stratified by clinical characteristics. LUAD = lung adenocarcinoma; LUSC = lung squamous cell carcinoma. See Figure 2 legend for expansion of other abbreviation.
Figure 4
Figure 4
Receiver operating characteristic curves for various predictive models using the clinical information (C), the main and interaction effects of DNA methylation (M), and gene expression (E). A, Three-year survival prediction. B, Five-year survival prediction. The AUC increase (%) was evaluated by comparing the model with that with only the clinical information. P values and 95% CIs were calculated by using 1,000 bootstrap samples. AUC = area under the receiver operating characteristic curve; ROC = receiver operating characteristic. See Figure 1 legend for expansion of other abbreviations.
Figure 5
Figure 5
Gene network and gene enrichment analysis of 49 genes to which 25 pairs of CpG probes with interaction and one CpG probe with main effect are mapped. A, The gene network plot constructed by GeneMANIA. Central nodes with boldface outline represent hub genes, and the size represents the connectivity degree of each node. B, Barplot of gene pathways enriched with significant genes, and colored by P values. C, The pathway network plot of these pathways enriched with significant genes. Significant pathways with a similarity > 0.3 are connected by edges. Each node represents an enriched term and is colored by its cluster identification. The size of the node represents the number of genes in the pathway. The edge represents potential biologic relationships between two pathways. GO = Gene Ontology.

References

    1. Bray F., Ferlay J., Soerjomataram I. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394–424. - PubMed
    1. Hirsch F.R., Scagliotti G.V., Mulshine J.L. Lung cancer: current therapies and new targeted treatments. Lancet. 2017;389(10066):299–311. - PubMed
    1. Tang S., Pan Y., Wang Y. Genome-wide association study of survival in early-stage non-small cell lung cancer. Ann Surg Oncol. 2015;22(2):630–635. - PubMed
    1. Egger G., Liang G., Aparicio A. Epigenetics in human disease and prospects for epigenetic therapy. Nature. 2004;429:457–463. - PubMed
    1. Feinberg A.P., Tycko B. The history of cancer epigenetics. Nat Rev Cancer. 2004;4(2):143–153. - PubMed

Publication types

Substances