. 2013 Dec 24;110(52):20935-40.

doi: 10.1073/pnas.1312486110. Epub 2013 Nov 25.

Spectral redemption in clustering sparse networks

Florent Krzakala¹, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang

Affiliations

PMID: 24277835
PMCID: PMC3876200
DOI: 10.1073/pnas.1312486110

Spectral redemption in clustering sparse networks

Florent Krzakala et al. Proc Natl Acad Sci U S A. 2013.

. 2013 Dec 24;110(52):20935-40.

doi: 10.1073/pnas.1312486110. Epub 2013 Nov 25.

Authors

Florent Krzakala¹, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang

Affiliation

¹ Ecole Supérieure de Physique et de Chimie Industrielles, 75005 Paris, France.

PMID: 24277835
PMCID: PMC3876200
DOI: 10.1073/pnas.1312486110

Abstract

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
Spectrum of the adjacency matrix of a sparse network generated by the block model (excluding the zero eigenvalues). Here, , , and , and we average over 20 realizations. Even though the eigenvalue given by Eq. 2 satisfies the threshold condition 1 and lies outside the semicircle of radius , deviations from the semicircle law cause it to get lost in the bulk, and the eigenvector of the second largest eigenvalue is uncorrelated with the community structure. As a result, spectral algorithms based on A are unable to identify the communities in this case.

formula image — **Fig. 1.**
Spectrum of the adjacency matrix of a sparse network generated by the block model (excluding the zero eigenvalues). Here, , , and , and we average over 20 realizations. Even though the eigenvalue given by Eq. 2 satisfies the threshold condition 1 and lies outside the semicircle of radius , deviations from the semicircle law cause it to get lost in the bulk, and the eigenvector of the second largest eigenvalue is uncorrelated with the community structure. As a result, spectral algorithms based on A are unable to identify the communities in this case.

**Fig. 2.**
Spectrum of the nonbacktracking matrix B for a network generated by the block model with same parameters as in Fig. 1. The leading eigenvalue is at , the second eigenvalue is close to , and the bulk of the spectrum is confined to the disk of radius . Because is outside the bulk, a spectral algorithm that labels vertices according to the sign of B’s second eigenvector (summed over the incoming edges at each vertex) labels the majority of vertices correctly.

**Fig. 3.**
First, second, and third largest eigenvalues , , and , respectively, of B as functions of . The third eigenvalue is complex, so we plot its modulus. Values are averaged over 20 networks of size and average degree . The green line in the figure represents , and the horizontal lines are c and , respectively. The second eigenvalue is well-separated from the bulk throughout the detectable regime.

**Fig. 4.**
Accuracy of spectral algorithms based on different linear operators, and of BP, for two groups of equal size. (*Left*) We vary while fixing the average degree ; the detectability transition given by 1 occurs at . (*Right*) We set and vary c; the detectability transition is at . Each point is averaged over 20 instances with . Our spectral algorithm based on the nonbacktracking matrix B achieves an accuracy close to that of BP, and both remain large all of the way down to the transition. Standard spectral algorithms (applied to the giant component of each graph, which contains all but a small fraction of the vertices) based on the adjacency matrix, modularity matrix, Laplacian, normalized Laplacian, and the random walk matrix all fail well above the transition, giving a regime where they do no better than chance.

**Fig. 5.**
Clustering for three groups of equal size. (*Left*) Scatter plot of the second and third eigenvectors (X and Y axis, respectively) of the nonbacktracking matrix B, with colors indicating the true group assignment. (*Right*) Analogous plot for the adjacency matrix A. Here, , , and . Applying k means gives an overlap 0.712 using B, but 0.0063 using A.

**Fig. 6.**
Spectrum of the nonbacktracking matrix in the complex plane for some real networks commonly used as benchmarks for community detection, taken from refs. –. The radius of the circle is the square root of the largest eigenvalue, which is a heuristic estimate of the bulk of the spectrum. The overlap is computed using the signs of the second eigenvector for the networks with two communities, and using k means for those with three and more communities. The nonbacktracking operator detects communities in all these networks, with an overlap comparable to the performance of other spectral methods. As in the case of synthetic networks generated by the stochastic block model, the number of real eigenvalues outside the bulk appears to be a good indicator of the number q of communities.

See this image and copyright information in PMC

References

1. Holland PW, Laskey KB, Leinhardt S. Stochastic blockmodels: First steps. Soc Networks. 1983;5(2):109–137.
1. Wang YJ, Wong GY. Stochastic blockmodels for directed graphs. J Am Stat Assoc. 1987;82(397):8–19.
1. Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17(4):395–416.
1. Bickel PJ, Chen A. A nonparametric view of network models and Newman-Girvan and other modularities. Proc Natl Acad Sci USA. 2009;106(50):21068–21073. - PMC - PubMed
1. Coja-Oghlan A, Mossel E, Vilenchik D. A spectral approach to analyzing belief propagation for 3-coloring. Combin Probab Comput. 2009;18(6):881–912.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Spectral redemption in clustering sparse networks

Affiliation

Spectral redemption in clustering sparse networks

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources