Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 18;11(1):6265.
doi: 10.1038/s41598-021-85285-4.

Integrated multi-omics analysis of ovarian cancer using variational autoencoders

Affiliations

Integrated multi-omics analysis of ovarian cancer using variational autoencoders

Muta Tah Hira et al. Sci Rep. .

Erratum in

Abstract

Cancer is a complex disease that deregulates cellular functions at various molecular levels (e.g., DNA, RNA, and proteins). Integrated multi-omics analysis of data from these levels is necessary to understand the aberrant cellular functions accountable for cancer and its development. In recent years, Deep Learning (DL) approaches have become a useful tool in integrated multi-omics analysis of cancer data. However, high dimensional multi-omics data are generally imbalanced with too many molecular features and relatively few patient samples. This imbalance makes a DL based integrated multi-omics analysis difficult. DL-based dimensionality reduction technique, including variational autoencoder (VAE), is a potential solution to balance high dimensional multi-omics data. However, there are few VAE-based integrated multi-omics analyses, and they are limited to pancancer. In this work, we did an integrated multi-omics analysis of ovarian cancer using the compressed features learned through VAE and an improved version of VAE, namely Maximum Mean Discrepancy VAE (MMD-VAE). First, we designed and developed a DL architecture for VAE and MMD-VAE. Then we used the architecture for mono-omics, integrated di-omics and tri-omics data analysis of ovarian cancer through cancer samples identification, molecular subtypes clustering and classification, and survival analysis. The results show that MMD-VAE and VAE-based compressed features can respectively classify the transcriptional subtypes of the TCGA datasets with an accuracy in the range of 93.2-95.5% and 87.1-95.7%. Also, survival analysis results show that VAE and MMD-VAE based compressed representation of omics data can be used in cancer prognosis. Based on the results, we can conclude that (i) VAE and MMD-VAE outperform existing dimensionality reduction techniques, (ii) integrated multi-omics analyses perform better or similar compared to their mono-omics counterparts, and (iii) MMD-VAE performs better than VAE in most omics dataset.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Methods: (A) VAE/MMD-VAE architecture consists of an encoder and a decoder made from 3 hidden layers and a bottleneck made from 2 layers and a 3-layered ANN-based classifier for supervised LFs learning, (B) Clustering using 2 LFs and ANN-based classification (e.g., cancer vs normal, and molecular subtypes) using 2 and 128 LFs, (C) Survival analysis using 128 LFs: (i) inferring survival subgroup, (ii) predicting subgroup and (iii) potential prognostic biomarkers.
Figure 2
Figure 2
Clustering of normal and cancer samples using the LFs learned using unsupervised PCA, t-SNE, VAE & MMD-VAE (using 2D for PCA & t-SNE and first 2 LFs for VAE and MMD-VAE) (a)–(d)) on DNA methylation (mono-omics) data from the GDC cohort. t-SNE was used (e,f) on the 128 learned LFs to identify 2 LFs for the clustering. Legends: 0—Normal, 1—Cancer.
Figure 3
Figure 3
Clustering molecular subtypes using the LFs learned through the supervised VAE & MMD-VAE + t-SNE (2D or 2 LFs): (ac) for MMD-VAE respectively for mono-omics, di-omics and tri-omics data, (df) for MMD-VAE + t-SNE respectively for mono-omics, di-omics and tri-omics data. Legends: 0—Immunoreactive, 1—Differentiated, 2—Proliferative and 3—Mesenchymal.
Figure 4
Figure 4
Survival analysis using existing using molecular subtypes and CRLFs-based survival subgroups: (a) survival analysis using the existing transcriptional subtypes show that they are not linked to the survival (p=0.19<0.05), (bf) survival analysis using the two subgroups show significant survival differences (p<0.05) between the groups. The results in (e) for 292 samples, and the rest are for 481 samples.
Figure 5
Figure 5
Association between CRLFs and input features: Input features of CNV_mRNA_methylation omics data are clustered based on the correlation data with the identified CRLFs. For example, the NDRG2 gene has strong correlation with LF30 and LF69.

Similar articles

Cited by

References

    1. UK. Cancer Research, Ovarian cancer statistics. https://www.cancerresearchuk.org/health-professional/cancer-statistics/s....
    1. Torre, L. A. et al. Ovarian cancer statistics, 2018. CA68, 284–296 (2018). - PMC - PubMed
    1. Doubeni CA, Doubeni AR, Myers AE. Diagnosis and management of ovarian cancer. Am. Fam. Physician. 2016;93:937–944. - PubMed
    1. Rosenthal AN, Menon U, Jacobs IJ. Screening for ovarian cancer. Clin. Ostet. Gynecol. 2006;49:433–447. doi: 10.1097/00003081-200609000-00004. - DOI - PubMed
    1. Lu M, Zhan X. The crucial role of multiomic approach in cancer research and clinically relevant outcomes. EPMA J. 2018;9:77–102. doi: 10.1007/s13167-018-0128-8. - DOI - PMC - PubMed

Publication types