Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 13;10(1):4679.
doi: 10.1038/s41598-020-61588-w.

Overall survival prediction of non-small cell lung cancer by integrating microarray and clinical data with deep learning

Affiliations

Overall survival prediction of non-small cell lung cancer by integrating microarray and clinical data with deep learning

Yu-Heng Lai et al. Sci Rep. .

Abstract

Non-small cell lung cancer (NSCLC) is one of the most common lung cancers worldwide. Accurate prognostic stratification of NSCLC can become an important clinical reference when designing therapeutic strategies for cancer patients. With this clinical application in mind, we developed a deep neural network (DNN) combining heterogeneous data sources of gene expression and clinical data to accurately predict the overall survival of NSCLC patients. Based on microarray data from a cohort set (614 patients), seven well-known NSCLC biomarkers were used to group patients into biomarker- and biomarker+ subgroups. Then, by using a systems biology approach, prognosis relevance values (PRV) were then calculated to select eight additional novel prognostic gene biomarkers. Finally, the combined 15 biomarkers along with clinical data were then used to develop an integrative DNN via bimodal learning to predict the 5-year survival status of NSCLC patients with tremendously high accuracy (AUC: 0.8163, accuracy: 75.44%). Using the capability of deep learning, we believe that our prediction can be a promising index that helps oncologists and physicians develop personalized therapy and build the foundation of precision medicine in the future.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Schematic of the study design. We built 7 pairs of biomarker+ and biomarker- gene interaction networks for patients divided with high and low well-known biomarkers expression levels identified with SetpMiner, respectively. Overlapping these 7 prognosis relevance values (PRV) lists produced 8 prognostic biomarkers. We chose lung adenocarcinoma (ADC) patients with complete clinical data (n = 512) and divided them into the training (n = 256), test (n = 171), and validation (n = 85) sets. We trained deep neural networks (DNNs) using the training set and tuned hyper-parameters using the validation set. After training the DNNs, we classified the test set and conducted the survival analysis.
Figure 2
Figure 2
The integrative DNN structure and performance comparison with other methods. (a) The left branch network deals with the microarray data source and the right branch network processes the clinical data source. Both subnetworks were merged together and form an integrative network. We merged the 4th hidden layer (with 40 neurons) of the microarray DNN data and the 4th hidden layer (with 18 neurons) of the clinical DNN. The merged layer contained 58 neurons and were stacked with two hidden layers with 32 neurons each for the final prediction. (b) Performance comparison of the integrative DNN with other methods for combined data.
Figure 3
Figure 3
DNN and RF performance evaluation on the merged cohort. (a) The performance of the DNN/RF with/without reclassification with only microarray data or both microarray and clinical data. (b) KM analysis of overall survival in the cohort microarray test set with stratification of risk groups based on the DNN and RF trained on only the microarray data. The cut-off threshold was set at either 0.5 (original) or using the new cut-off point from Youden index (reclassification). (c) Both microarray and clinical data were applied to DNN and RF, and the cut-off threshold was set at either 0.5 (original) or using the new cut-off point from Youden index (reclassification). (d) Univariate analysis with proportional-hazards model of each classifier.
Figure 4
Figure 4
DNN and RF performance evaluation on the independent validation dataset. (a) AUCs and accuracies of the DNN and RF on the independent validation set. (b) Univariate analysis with proportional-hazards model of each classifier. (c) KM analysis of overall survival in the independent validation set with stratification of risk groups based on the DNN and RF trained with only the microarray data. The cut-off threshold was set at either 0.5 (original) or using the new cut-off point from Youden index (reclassification). (d) Both microarray and clinical data were applied to DNN and RF, and the cut-off threshold was set at either 0.5 (original) or using the new cut-off point from Youden index (reclassification). (e) Additional performance metrics of the DNN with/without reclassification with only microarray data or both microarray and clinical data.
Figure 5
Figure 5
The interaction network of prognostic biomarkers. Visualization of interdependencies of the 15 selected biomarkers via STRING (https://string-db.org/).

References

    1. Siegel R, Naishadham D, Jemal A. Cancer statistics, 2013. CA: A Cancer Journal for Clinicians. 2013;63:11–30. - PubMed
    1. Hoffman PC, Mauer AM, Vokes EE. Lung cancer. The Lancet. 2000;355:479–485. doi: 10.1016/S0140-6736(00)82038-3. - DOI - PubMed
    1. Pignon J-P, et al. Lung adjuvant cisplatin evaluation: a pooled analysis by the LACE Collaborative Group. J. Clin. Oncol. 2008;26:3552–3559. doi: 10.1200/JCO.2007.13.9030. - DOI - PubMed
    1. Chen H-Y, et al. A Five-Gene Signature and Clinical Outcome in Non–Small-Cell Lung Cancer. New England Journal of Medicine. 2007;356:11–20. doi: 10.1056/NEJMoa060096. - DOI - PubMed
    1. Baeuerle PA, Gires O. EpCAM (CD326) finding its role in cancer. British Journal of Cancer. 2007;96:417–423. doi: 10.1038/sj.bjc.6603494. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances