Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 10;14(10):e0223267.
doi: 10.1371/journal.pone.0223267. eCollection 2019.

Clustering of the structures by using "snakes-&-dragons" approach, or correlation matrix as a signal

Affiliations

Clustering of the structures by using "snakes-&-dragons" approach, or correlation matrix as a signal

Victor P Andreev et al. PLoS One. .

Abstract

Biological, ecological, social, and technological systems are complex structures with multiple interacting parts, often represented by networks. Correlation matrices describing interdependency of the variables in such structures provide key information for comparison and classification of such systems. Classification based on correlation matrices could supplement or improve classification based on variable values, since the former reveals similarities in system structures, while the latter relies on the similarities in system states. Importantly, this approach of clustering correlation matrices is different from clustering elements of the correlation matrices, because our goal is to compare and cluster multiple networks-not the nodes within the networks. A novel approach for clustering correlation matrices, named "snakes-&-dragons," is introduced and illustrated by examples from neuroscience, human microbiome, and macroeconomics.

PubMed Disclaimer

Conflict of interest statement

We have read the journal's policy and the authors of this manuscript have the following competing interests: LH, JZ, GL, GEF have none; RMM declares funding from NIH, HRSA, and the Laura and John Arnold Foundation; VA declares funding from NIH, authorship of two patents from 1997 and 2008 unrelated to the theme of the paper, and a travel stipend and honorarium for being an invited speaker at SUFU (Society for Urodynamics) 2019.

Figures

Fig 1
Fig 1. Explanation of snakes-&-dragons approach.
A-snake vector. B-dragon vector. See details in the Methods section.
Fig 2
Fig 2. Four types of dragon vectors.
A-Dragon 1, includes means and variances of the variables. B- Dragon 2, includes also overall network property information. C- Dragon 3, combines correlations along multiple dimensions of the data matrix or multiple locations. D-Dragon 4 is composed of several dragons presenting different types of clinical and omics data.
Fig 3
Fig 3. Simulating connectivity matrices with increased noise level.
A-original matrix #1. B–simulated matrix with q/n = 0.1, n = 360; C–q/n = 3, n = 12; D- q/n = 6, n = 6; E-q/n = 9, n = 4; F- q/n = 12, n = 3.
Fig 4
Fig 4. Increased variability of simulated correlation matrices with increased q/n value.
A-3 instances of correlation matrices generated from the connectivity matrix #1 using q/n = 2, n = 18; B-3 instances of correlation matrices generated from the connectivity matrix #1 using q/n = 12, n = 3. See how variability of the matrices is increased in B (q/n = 12) versus A (q/n = 2).
Fig 5
Fig 5. Explanation of increased variability of the simulated matrices.
A- histograms of standard deviations of the elements of the simulated connectivity matrices for various q/n; B- signal to noise ratio vs. q/n.
Fig 6
Fig 6. Clustering of brain connectivity matrices from pilot data set of young vs. old healthy persons.
A-dendrogram based on RS, B-dendrogram based on T-statistics, C-dendrogram based on S-statistics, D-dendrogram based on snake vectors, E-H- confusion matrices for the above four approaches. Note that in all four approaches we used a hierarchical clustering method to allow direct comparison.
Fig 7
Fig 7. Misclassification error in clustering of simulated connectivity matrices.
Comparison of hierarchical clustering results for RS, T- and S-statistics, and snakes vectors, with k-means and resampling-based consensus clustering using snake vectors. Snake vectors based approaches outperform RS, T- and S-statistics based ones. Red, blue, and green curves demonstrate that the main advantage is due to the use of snake vectors, not due to the type of clustering algorithm used.
Fig 8
Fig 8. Resampling-based consensus clustering of 500 brain connectivity matrices from GSP project.
A- Consensus matrix. Two identified clusters are presented as yellow squares (yellow color indicating the high probability of a pair of brains belonging to the same cluster). High contrast in the on-diagonal and off-diagonal values of probability indicate two clusters. B- Checking the number of clusters with Calinski criterion. Calinski criterion have a maximum at k = 2 indicating two clusters as well (both with snakes-&-dragons approach and with RS, T- and S-statistics).
Fig 9
Fig 9. Mean brain connectivity matrices for two clusters identified in GSP data.
A- Mean connectivity matrix for cluster 1, B- Mean connectivity matrix for cluster 2, C- Difference of mean connectivity matrices for cluster 2 and cluster 1, D- 8395 significantly different values of connectivity observed in cluster 1 vs. cluster 2. The 169 brain areas were divided into 10 networks: visual foveal (VFN), visual peripheral (VPN), dorsal attention (DAN), motor (MN), auditory (AN), cingulo-opercular (CON), ventral attention (VAN), language (LN), fronto-parietal (FPN), and default mode (DMN) [26].
Fig 10
Fig 10. Correlation matrices reflecting microbiome dynamics at four body sites (gut, tongue, palm, and forehead) for three clusters of students identified based on the gut microbiome data.
Fig 11
Fig 11. Correlation matrices reflecting microbiome dynamics at four body sites (gut, tongue, palm, and forehead) for three clusters of students identified based on the microbiome data for each of the body sites.
Fig 12
Fig 12. Pairwise comparison of cluster membership across four body sites.
Fig 13
Fig 13. Clustering based on dragon vectors describing microbiomes of four body sites.
A-Mean dragon vectors for three clusters of students identified by clustering the concatenated snake vectors for gut, tongue, palm, and forehead. B-Sankey diagrams comparing cluster membership based on the dynamics of microbiomes at each site and all four sites’ microbiomes combined.
Fig 14
Fig 14. Correlation matrices of macroeconomic indices of eight identified clusters of economies.

References

    1. Duda RO, Hart PE, Stork DG. Pattern classification, 2nd ed 2001. Wiley, New York.
    1. Roff DA, Mousseau TA, Howard DJ. Variation in genetic architecture of calling song among populations of Allonemobius socius, A. fasciatus and a hybrid population: drift or selection? Evolution. 1999; 53:216–224. 10.1111/j.1558-5646.1999.tb05347.x - DOI - PubMed
    1. Cheverud JM. Quantitative genetic analysis of cranial morphology in the cotton-top (Saguinus oedipus) and saddle-back (S. fuscicollis) tamarins. J Evol Biol. 1996; 9:5–42
    1. Pielou EC. Probing multivariate data with random skewers: a preliminary to direct gradient analysis. Oikos. 1984; 42:161–165.
    1. Garcia C. A simple procedure for the comparison of covariance matrices. BMC Evol Biol. 2012; 12:222 10.1186/1471-2148-12-222 - DOI - PMC - PubMed