Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Oct 30:236:19-25.
doi: 10.1016/j.jneumeth.2014.08.001. Epub 2014 Aug 10.

Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data

Affiliations
Comparative Study

Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data

Benson Mwangi et al. J Neurosci Methods. .

Abstract

Background: Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data.

New method: We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm.

Comparison with existing methods: t-SNE was evaluated against classical principal component analysis.

Conclusion: Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders.

Keywords: Big data; Multimodal neuroimaging; Research domain criteria (RDoC); Unsupervised machine learning; t-Distributed stochastic neighbour embedding (t-SNE).

PubMed Disclaimer

Publication types