Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 10;10(1):4435.
doi: 10.1038/s41598-020-61298-3.

Identification of a novel gene signature for the prediction of recurrence in HCC patients by machine learning of genome-wide databases

Affiliations

Identification of a novel gene signature for the prediction of recurrence in HCC patients by machine learning of genome-wide databases

Jie Shen et al. Sci Rep. .

Abstract

Hepatocellular carcinoma (HCC) is a common malignant tumor in China. In the present study, we aimed to construct and verify a prediction model of recurrence in HCC patients using databases (TCGA, AMC and Inserm) and machine learning methods and obtain the gene signature that could predict early relapse of HCC. Statistical methods, such as feature selection, survival analysis and Chi-Square test in R software, were used to analyze and select mutant genes related to disease free survival (DFS), race and vascular invasion. In addition, whole-exome sequencing was performed on 10 HCC patients recruited from our center, and the sequencing results were compared with the databases. Using the databases and machine learning methods, the prediction model of recurrence was constructed and optimized, and the selected mutant genes were verified in the test group. The accuracy of prediction was 74.19%. Moreover, these 10 patients from our center were used to verify these mutant genes and the prediction model, and a success rate of 80% was achieved. Collectively, we discovered recurrence-related genes and established recurrence prediction model of recurrence for HCC patients, which could provide significant guidance for clinical prediction of recurrence.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
(A) A total of 31 genes with significant differences in DFS were selected from the TCGA database. Brown color indicates that the gene is also statistically different in AMC database. Blue color indicates that the gene is also statistically different in AMC database, while it is not a highly frequent mutation. Purple color shows that the gene is also statistically different in AMC database, while such difference is opposite. (B) A total of 15 genes with significant differences in DFS were selected from AMC database. Brown color indicates that the gene is also statistically different in TCGA database. Purple color shows that the gene is also statistically different in AMC database, while such difference is opposite.
Figure 2
Figure 2
(A) Heat maps of somatic cell mutation, stage and age information in 10 patients with HCC; (B) left: Highly frequent mutant genes in 10 patients (25 in total). Right: Highly frequent mutant genes in TCGA database (28 in total). Heat maps were generated for the 53 gene mutations in 10 patients. The frequency of TCGA mutations was not high in our 10 patients. (C) Comparison of high frequency gene mutations between 10 HCC patients in our center and TCGA database. (D) GO and KEGG pathways involved in 10 HCC patients in our center. (E) Circos of mutation information in 10 HCC patients. (F) Venn diagram for comparison of mutant genes and TCGA mutant genes in 10 HCC patients. (G) Clustering heat map of high frequency mutant genes in 10 HCC patients. (H) Heat map of driver gene mutation in 10 HCC patients.
Figure 3
Figure 3
(A) The flow of decision tree model; (B) The prediction weight of node genes in the decision tree; (C) The weight of each gene analyzed by SVM Model; (D) the ROC curves of the decision tree model and the SVM model are compared.
Figure 4
Figure 4
The whole study flow. (A) Kaplan-Meier survival analysis and log-rank test were used to screen DFS-related mutant genes from TCGA database and AMC database. Then these genes were cross-verified in TCGA and AMC, and four DFS-related mutant genes were screened out in these two databases; (B) Boruta algorithm, Fisher’s test and Pearson’s test were used to screen race (Asian/non-Asian)-associated mutations from TCGA database; (C) Boruta algorithm, Fisher’s test and Pearson’s test were used to screen vascular invasion-associated mutations from TCGA, AMC and Inserm database; (D) The HCC data in TCGA were used to construct a model for predicting recurrence, and then AMC and 10 HCC patients in our center were used for verification.

Similar articles

Cited by

References

    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J. Clin. 2018;68:7–30. doi: 10.3322/caac.21442. - DOI - PubMed
    1. Totoki Y, et al. Trans-ancestry mutational landscape of hepatocellular carcinoma genomes. Nat. Genet. 2014;46:1267–1273. doi: 10.1038/ng.3126. - DOI - PubMed
    1. Fujimoto A, et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 2016;48:500–509. doi: 10.1038/ng.3547. - DOI - PubMed
    1. Schulze K, et al. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets. Nat. Genet. 2015;47:505–511. doi: 10.1038/ng.3252. - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research Network Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma. Cell. 2017;169(7):1327–1341.e23. doi: 10.1016/j.cell.2017.05.046. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances