Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 15;36(4):1159-1166.
doi: 10.1093/bioinformatics/btz704.

Spectrum: fast density-aware spectral clustering for single and multi-omic data

Affiliations

Spectrum: fast density-aware spectral clustering for single and multi-omic data

Christopher R John et al. Bioinformatics. .

Abstract

Motivation: Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied on single-omic data, for example, in single cell RNA-seq analysis for clustering the transcriptomes of individual cells. This technology has clinical implications. Our motivation was therefore to develop a flexible and effective spectral clustering tool for both single and multi-omic data.

Results: We present Spectrum, a new spectral clustering method for complex omic data. Spectrum uses a self-tuning density-aware kernel we developed that enhances the similarity between points that share common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to reduce noise and reveal underlying structures. Spectrum contains a new method for finding the optimal number of clusters (K) involving eigenvector distribution analysis. Spectrum can automatically find K for both Gaussian and non-Gaussian structures. We demonstrate across 21 real expression datasets that Spectrum gives improved runtimes and better clustering results relative to other methods.

Availability and implementation: Spectrum is available as an R software package from CRAN https://cran.r-project.org/web/packages/Spectrum/index.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Spectrum clusters five simulated Gaussian clusters and finds the correct K. (a) PCA showing the five simulated Gaussian clusters. (b) The eigenvalues of the eigenvectors from the data’s graph Laplacian, the greatest eigengap is between the fifth and sixth eigenvectors, therefore correctly indicating K = 5
Fig. 2.
Fig. 2.
Spectrum clusters RNA-seq data to find cancer subtypes with different survival times. (a) t-SNE plot illustrating the four clusters Spectrum identified in a brain cancer RNA-seq dataset (Ceccarelli et al., 2016). (b) Survival curve analysis results using the discovered clusters showing a P-value from a Cox proportional hazards regression model using a log-rank test to test the significance of the survival time differences between clusters
Fig. 3.
Fig. 3.
The adaptive density-aware kernel demonstrates an advantage in multi-omic analysis. On the right-hand side of the panel are the results for the Zelnik-Manor kernel, while the density-aware kernel results are shown on the left-hand side. (a) Spectrum clustering assignments from the brain cancer dataset (Ceccarelli et al., 2016), UMAP was run on the integrated similarity matrices for mRNA, miRNA and protein data to generate the plots. (b) Survival curves with P values from a Cox proportional hazards regression model using a log-rank test to assess significance between clusters

References

    1. Agrawal N. et al. (2014) Integrated genomic characterization of papillary thyroid carcinoma. Cell, 159, 676–690. - PMC - PubMed
    1. Akbani R. et al. (2015) Genomic classification of cutaneous melanoma. Cell, 161, 1681–1696. - PMC - PubMed
    1. Baron M. et al. (2016) A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell Syst., 3, 346–360.e344. - PMC - PubMed
    1. Butler A. et al. (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol., 36, 411. - PMC - PubMed
    1. Camp J.G. et al. (2017) Multilineage communication regulates human liver bud development from pluripotency. Nature, 546, 533. - PubMed

Publication types