Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 14;13(1):112.
doi: 10.1186/s13073-021-00930-x.

DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data

Affiliations

DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data

Olivier B Poirion et al. Genome Med. .

Abstract

Multi-omics data are good resources for prognosis and survival prediction; however, these are difficult to integrate computationally. We introduce DeepProg, a novel ensemble framework of deep-learning and machine-learning approaches that robustly predicts patient survival subtypes using multi-omics data. It identifies two optimal survival subtypes in most cancers and yields significantly better risk-stratification than other multi-omics integration methods. DeepProg is highly predictive, exemplified by two liver cancer (C-index 0.73-0.80) and five breast cancer datasets (C-index 0.68-0.73). Pan-cancer analysis associates common genomic signatures in poor survival subtypes with extracellular matrix modeling, immune deregulation, and mitosis processes. DeepProg is freely available at https://github.com/lanagarmire/DeepProg.

Keywords: Cancer; Deep learning; Ensemble learning; Machine learning; Prognosis; Survival; multi-omics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The computational framework of DeepProg. DeepProg uses the boosting strategy to build several models from a random subset of the dataset. For each model, each omic data matrix is normalized and then transformed using an autoencoder. Each of the new hidden-layer features in autoencoder is then tested for association with survival using univariate Cox-PH models. The features significantly associated with survival are then subject to clustering (Gaussian clustering by default). Upon determining the optimal cluster, the top features in each omic input data type are selected through Kruskal-Wallis analysis (default threshold = 0.05). Finally, these top omics features are used to construct a support vector machine (SVM) classifier and to predict the survival risk group of a new sample. DeepProg combines the outputs of all the classifier models to produce more robust results
Fig. 2
Fig. 2
DeepProg performance for the 32 TCGA cancer datasets. A Kaplan-Meier plots for each cancer type, where the survival risk group stratification is determined by DeepProg. B The density distributions of -log10 (log-rank p value) for the Cox-PH models based on the subtypes determined by DeepProg (light grey line), SNF (dark grey line), or the pair-wise -log10 (log-rank p value) differences between DeepProg and SNF (blue line). C Smoothed C-index distributions for the Cox-PH models based on the subtypes determined by DeepProg (light grey line), SNF (dark grey line), or the pair-wise C-index difference between DeepProg and SNF (blue line)
Fig. 3
Fig. 3
Comparing the performance of DeepProg and its variations, where the default autoencoder is substituted by a simple PCA decomposition or MOFA+ method to generate an input matrix of the same dimensions, using TCGA HCC (AC) and BRCA (DF) datasets. Methods in comparison: A, D PCA; B, E MOFA+; C, F DeepProg default
Fig. 4
Fig. 4
Validation of DeepProg subtype predictions by independent breast cancer and liver cancer cohorts. RNA-Seq Validation datasets for HCC: A LIRI (n = 230) and B GSE (n = 221) and validation datasets for BRCA: C Patiwan (n = 159), D Metabric (n = 1981), E Anna (n = 249), and F Miller (n = 236)
Fig. 5
Fig. 5
Pan-cancer analysis of RNA-Seq gene signatures in the worst survival vs. other groups. A Top 100 over- and under-expressed genes for RNA, MIR, and METH omics ranked by survival predictive power. The colors correspond to the ranks of the genes based on their –log10 (log-rank p value) of the univariate Cox-PH model. Based on these scores, the 32 cancers and the features are clustered using the WARD method. B Co-expression network constructed with the top 200 differentially expressed genes from the 32 cancers. The 200 genes are clustered from the network topology with the Louvain algorithm. For each submodule, we identified the most significantly enriched pathway as shown on the figure. C The expression values of these 200 genes used to construct the co-expression network. A clustering of the cancers using these features with the WARD method is represented in the x-axis
Fig. 6
Fig. 6
Transfer learning to predict survival subtypes of certain cancers using the DeepProg models trained by different cancers. A Heatmap of the Cox-PH log-rank p values for the subtypes inferred using each cancer as the training dataset. B Kaplan-Meier plot of predicted subtypes for COAD, using the DeepProg model trained on STAD. C Kaplan-Meier plot of predicted subtypes for STAD, using the DeepProg model trained on COAD

References

    1. Anaya J, Reon B, Chen W-M, Bekiranov S, Dutta A. A pan-cancer analysis of prognostic genes. PeerJ. 2016;3:e1499. doi: 10.7717/peerj.1499. - DOI - PMC - PubMed
    1. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype--phenotype interactions. Nat Rev Genet. 2015;16(2):85. doi: 10.1038/nrg3868. - DOI - PubMed
    1. Choi J-H, Hong S-E, Woo HG. Pan-cancer analysis of systematic batch effects on somatic sequence variations. BMC Bioinformatics. 2017;18(1):211. doi: 10.1186/s12859-017-1627-7. - DOI - PMC - PubMed
    1. Zang C, Wang T, Deng K, et al. High-dimensional genomic data bias correction and data integration using MANCIE. Nat Commun. 2016;7:11305. doi: 10.1038/ncomms11305. - DOI - PMC - PubMed
    1. Han H. Diagnostic biases in translational bioinformatics. BMC Med Genomics. 2015;8(1):46. doi: 10.1186/s12920-015-0116-y. - DOI - PMC - PubMed

Publication types

LinkOut - more resources