Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 12;10(1):13654.
doi: 10.1038/s41598-020-70229-1.

Multi-view clustering for multi-omics data using unified embedding

Affiliations

Multi-view clustering for multi-omics data using unified embedding

Sayantan Mitra et al. Sci Rep. .

Abstract

In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other. Embedding learning is an effective strategy for nearest neighbour search and dimensionality reduction in large data sets. This paper attempts to learn a unified probability distribution of the points across different views and generates a unified embedding in a low-dimensional space to optimally preserve neighbourhood identity. Probability distributions generated for each point for each view are combined by conflation method to create a single unified distribution. The goal is to approximate this unified distribution as much as possible when a similar operation is performed on the embedded space. As a cost function, the sum of Kullback-Leibler divergence over the samples is used, which leads to a simple gradient adjusting the position of the samples in the embedded space. The proposed methodology can generate embedding from both complete and incomplete multi-view data sets. Finally, a multi-objective clustering technique (AMOSA) is applied to group the samples in the embedded space. The proposed methodology, Multi-view Neighbourhood Embedding (MvNE), shows an improvement of approximately 2-3% over state-of-the-art models when evaluated on 10 omics data sets.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Different views of the data sets are combined in the probabilistic space by conflation method. The low-dimensional embedding is generated by approximating the combined probability distribution in the lower-dimensional space.
Figure 2
Figure 2
Example of conflation technique. The red curves are the two independent distributions, yellow curve is the probability distribution obtained by averaging the probabilities, blue curve is the probability distribution obtained by averaging the data and green curve denotes the distribution obtained by conflation technique.
Figure 3
Figure 3
Network structure of the stacked autoencoder. Output of the “feature” layer is the Yinit.
Figure 4
Figure 4
Change in NMI(%) with changes in k.
Figure 5
Figure 5
Change in NMI(%) with the changing dimension (dim) of the embedded dataset.
Figure 6
Figure 6
Heatmap showing the levels of expression of selected gene markers in the BRCA dataset for each subclass.
Figure 7
Figure 7
Gene expression profile plot in the BRCA dataset for each subclass.
Figure 8
Figure 8
Error plot for low dimension generation.

References

    1. Sun S. A survey of multi-view machine learning. Neural Comput. Appl. 2013;23:2031–2038. doi: 10.1007/s00521-013-1362-6. - DOI
    1. Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46:10546–10562. doi: 10.1093/nar/gky889. - DOI - PMC - PubMed
    1. Hotelling H. Relations between two sets of variates. Biometrika. 1936;28:321–377. doi: 10.1093/biomet/28.3-4.321. - DOI
    1. Blum, A. & Mitchell, T. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, 92–100, 10.1145/279943.279962 (ACM, New York, NY, USA, 1998).
    1. Zhou Z-H, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 2005;17:1529–1541. doi: 10.1109/TKDE.2005.186. - DOI