Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 27;16(1):35.
doi: 10.1186/s12920-023-01462-6.

Identification of gene profiles related to the development of oral cancer using a deep learning technique

Affiliations

Identification of gene profiles related to the development of oral cancer using a deep learning technique

Leili Tapak et al. BMC Med Genomics. .

Abstract

Background: Oral cancer (OC) is a debilitating disease that can affect the quality of life of these patients adversely. Oral premalignant lesion patients have a high risk of developing OC. Therefore, identifying robust survival subgroups among them may significantly improve patient therapy and care. This study aimed to identify prognostic biomarkers that predict the time-to-development of OC and survival stratification for patients using state-of-the-art machine learning and deep learning.

Methods: Gene expression profiles (29,096 probes) related to 86 patients from the GSE26549 dataset from the GEO repository were used. An autoencoder deep learning neural network model was used to extract features. We also used a univariate Cox regression model to select significant features obtained from the deep learning method (P < 0.05). High-risk and low-risk groups were then identified using a hierarchical clustering technique based on 100 encoded features (the number of units of the encoding layer, i.e., bottleneck of the network) from autoencoder and selected by Cox proportional hazards model and a supervised random forest (RF) classifier was used to identify gene profiles related to subtypes of OC from the original 29,096 probes.

Results: Among 100 encoded features extracted by autoencoder, seventy features were significantly related to time-to-OC-development, based on the univariate Cox model, which was used as the inputs for the clustering of patients. Two survival risk groups were identified (P value of log-rank test = 0.003) and were used as the labels for supervised classification. The overall accuracy of the RF classifier was 0.916 over the test set, yielded 21 top genes (FUT8-DDR2-ATM-CD247-ETS1-ZEB2-COL5A2-GMAP7-CDH1-COL11A2-COL3A1-AHR-COL2A1-CHORDC1-PTP4A3-COL1A2-CCR2-PDGFRB-COL1A1-FERMT2-PIK3CB) associated with time to developing OC, selected among the original 29,096 probes.

Conclusions: Using deep learning, our study identified prominent transcriptional biomarkers in determining high-risk patients for developing oral cancer, which may be prognostic as significant targets for OC therapy. The identified genes may serve as potential targets for oral cancer chemoprevention. Additional validation of these biomarkers in experimental prospective and retrospective studies will launch them in OC clinics.

Keywords: Deep learning; Gene expression; Oral cancer.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest to declare for this study.

Figures

Fig. 1
Fig. 1
a Architecture of the autoencoder, and b loss function values over epochs
Fig. 2
Fig. 2
Kaplan Meier curve for two subgroup of survival time
Fig. 3
Fig. 3
Heat-map of the 21 selected genes using random forest related two identified survival groups
Fig. 4
Fig. 4
Summary of the top GO results and KEGG pathways
Fig. 5
Fig. 5
The overlap between the top predicted target genes, ranked by MNC, MCC, and degree, is illustrated in a Venn diagram
Fig. 6
Fig. 6
The PPI network of identified genes, formed by using Cytoscape software. Proteins are represented by nodes, and interactions between two proteins are described by edges
Fig. 7
Fig. 7
a ROC curve related to the prediction of oral cancer patients and healthy controls in in silico validation data set (GSE9844); b prediction error curve in predicting survival of oral cancer patients over GSE41613 data set as in silico validation; c Kaplan-Mayer curves of survival subgroups identified using selected genes over GSE41613 data set as in silico validation

Similar articles

Cited by

References

    1. Glick M. Burket's oral medicine. 2015: PMPH USA.
    1. Ariya S, James A, Joseph B. Computational analysis of oral cancer gene expression profile and identification of MiRNAs and their regulatory hub genes. J Complement Med Res. 2020;11(3):154–159. doi: 10.5455/jcmr.2020.11.03.19. - DOI
    1. Ferlay J, Colombet M, Soerjomataram I, Mathers C, Parkin DM, Piñeros M, Znaor A, Bray F. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer. 2019;144(8):1941–1953. doi: 10.1002/ijc.31937. - DOI - PubMed
    1. Leemans CR, Snijders PJ, Brakenhoff RH. The molecular landscape of head and neck cancer. Nat Rev Cancer. 2018;18(5):269–282. doi: 10.1038/nrc.2018.11. - DOI - PubMed
    1. Mosaddad SA, Beigi K, Doroodizadeh T, Haghnegahdar M, Golfeshan F, Ranjbar R, Tebyanian H. Therapeutic applications of herbal/synthetic/bio-drug in oral cancer: An update. Eur J Pharmacol. 2021;890:173657. doi: 10.1016/j.ejphar.2020.173657. - DOI - PubMed

Publication types

Substances