Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 8;12(12):1839.
doi: 10.3390/biom12121839.

Deep-Learning Algorithm and Concomitant Biomarker Identification for NSCLC Prediction Using Multi-Omics Data Integration

Affiliations

Deep-Learning Algorithm and Concomitant Biomarker Identification for NSCLC Prediction Using Multi-Omics Data Integration

Min-Koo Park et al. Biomolecules. .

Abstract

Early diagnosis of lung cancer to increase the survival rate, which is currently at a low range of mid-30%, remains a critical need. Despite this, multi-omics data have rarely been applied to non-small-cell lung cancer (NSCLC) diagnosis. We developed a multi-omics data-affinitive artificial intelligence algorithm based on the graph convolutional network that integrates mRNA expression, DNA methylation, and DNA sequencing data. This NSCLC prediction model achieved a 93.7% macro F1-score, indicating that values for false positives and negatives were substantially low, which is desirable for accurate classification. Gene ontology enrichment and pathway analysis of features revealed that two major subtypes of NSCLC, lung adenocarcinoma and lung squamous cell carcinoma, have both specific and common GO biological processes. Numerous biomarkers (i.e., microRNA, long non-coding RNA, differentially methylated regions) were newly identified, whereas some biomarkers were consistent with previous findings in NSCLC (e.g., SPRR1B). Thus, using multi-omics data integration, we developed a promising cancer prediction algorithm.

Keywords: biomarker; cancer prediction; deep learning; gene ontology enrichment; graph convolutional network; non-small-cell lung cancer.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1
Data schema characterized by directed acyclic graph (DAG) structure. The DAG architecture is implemented to label the multi-omics data. This data-label flow is to avoid potential data duplication derived from graph-based preprocessing.
Figure 2
Figure 2
Overview of preprocessing module and graph convolutional network (GCN)-based non-small-cell lung cancer (NSCLC) prediction deep learning model. (a) GCN-based preprocessing module for weight optimization. (b) GCN-based NSCLC prediction algorithm. DNA sequencing data including targetable gene aberrations are served as discriminating predictors to match the most suitable therapeutic agents in the Mutation FC layer.
Figure 3
Figure 3
Performance comparisons of NSCLC prediction model with alternative classifier models. Pairwise comparisons of the implemented algorithm performances were analyzed via five-fold cross-validation. To improve discrimination, the metric cut-off was set at 90%. The standard deviation of each performance is illustrated by a vertical error bar. AUC of ROC denotes area under the receiver operating characteristic curve.
Figure 4
Figure 4
GO enrichment and pathway analysis of NSCLC features. (a) Visualized networks of enriched GO “Biological Process” terms of NSCLC were grouped based on shared genes (Kappa score threshold = 0.4). Enriched terms by p value corrected with Bonferroni were retained as the functional description. The node size is proportional to the degree of significance. (b) % terms per group represents the proportion of GO terms in the NSCLC features.
Figure 5
Figure 5
GO enrichment and pathway analysis of LUAD features. (a) Enriched GO “Biological Process” terms of LUAD were grouped based on shared genes (Kappa score threshold = 0.4). Enriched terms by p value corrected with Bonferroni were retained as the functional description. The node size is proportional to the degree of significance. (b) % terms per group represents the proportion of GO terms in the LUAD features.
Figure 6
Figure 6
GO enrichment and pathway analysis of LUSC features. (a) Enriched GO “Biological Process” terms of LUSC were grouped based on shared genes (Kappa score threshold = 0.4). Enriched terms by p value corrected with Bonferroni were retained as the functional description. The node size is proportional to the degree of significance. (b) % terms per group represents the proportion of GO terms in the LUSC features.

Similar articles

Cited by

References

    1. Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Howlader N., Forjaz G., Mooradian M.J., Meza R., Kong C.Y., Cronin K.A., Mariotto A.B., Lowy D.R., Feuer E.J. The Effect of Advances in Lung-Cancer Treatment on Population Mortality. N. Engl. J. Med. 2020;383:640–649. doi: 10.1056/NEJMoa1916623. - DOI - PMC - PubMed
    1. Drilon A., Jenkins C., Iyer S., Schoenfeld A., Keddy C., Davare M.A. ROS1-dependent cancers—Biology, diagnostics and therapeutics. Nat. Rev. Clin. Oncol. 2021;18:35–55. doi: 10.1038/s41571-020-0408-9. - DOI - PMC - PubMed
    1. Kheder E.S., Hong D.S. Emerging Targeted Therapy for Tumors with NTRK Fusion Proteins. Clin. Cancer Res. 2018;24:5807–5814. doi: 10.1158/1078-0432.CCR-18-1156. - DOI - PubMed
    1. Laurie S.A. Targeted therapy in BRAF-mutated lung adenocarcinoma. Lancet Oncol. 2016;17:550–551. doi: 10.1016/S1470-2045(16)00117-0. - DOI - PubMed

Publication types

Substances