Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec 24;110(52):20935-40.
doi: 10.1073/pnas.1312486110. Epub 2013 Nov 25.

Spectral redemption in clustering sparse networks

Affiliations

Spectral redemption in clustering sparse networks

Florent Krzakala et al. Proc Natl Acad Sci U S A. .

Abstract

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Spectrum of the adjacency matrix of a sparse network generated by the block model (excluding the zero eigenvalues). Here, formula image, formula image, and formula image, and we average over 20 realizations. Even though the eigenvalue formula image given by Eq. 2 satisfies the threshold condition 1 and lies outside the semicircle of radius formula image, deviations from the semicircle law cause it to get lost in the bulk, and the eigenvector of the second largest eigenvalue is uncorrelated with the community structure. As a result, spectral algorithms based on A are unable to identify the communities in this case.
Fig. 2.
Fig. 2.
Spectrum of the nonbacktracking matrix B for a network generated by the block model with same parameters as in Fig. 1. The leading eigenvalue is at formula image, the second eigenvalue is close to formula image, and the bulk of the spectrum is confined to the disk of radius formula image. Because formula image is outside the bulk, a spectral algorithm that labels vertices according to the sign of B’s second eigenvector (summed over the incoming edges at each vertex) labels the majority of vertices correctly.
Fig. 3.
Fig. 3.
First, second, and third largest eigenvalues formula image, formula image, and formula image, respectively, of B as functions of formula image. The third eigenvalue is complex, so we plot its modulus. Values are averaged over 20 networks of size formula image and average degree formula image. The green line in the figure represents formula image, and the horizontal lines are c and formula image, respectively. The second eigenvalue formula image is well-separated from the bulk throughout the detectable regime.
Fig. 4.
Fig. 4.
Accuracy of spectral algorithms based on different linear operators, and of BP, for two groups of equal size. (Left) We vary formula image while fixing the average degree formula image; the detectability transition given by 1 occurs at formula image. (Right) We set formula image and vary c; the detectability transition is at formula image. Each point is averaged over 20 instances with formula image. Our spectral algorithm based on the nonbacktracking matrix B achieves an accuracy close to that of BP, and both remain large all of the way down to the transition. Standard spectral algorithms (applied to the giant component of each graph, which contains all but a small fraction of the vertices) based on the adjacency matrix, modularity matrix, Laplacian, normalized Laplacian, and the random walk matrix all fail well above the transition, giving a regime where they do no better than chance.
Fig. 5.
Fig. 5.
Clustering for three groups of equal size. (Left) Scatter plot of the second and third eigenvectors (X and Y axis, respectively) of the nonbacktracking matrix B, with colors indicating the true group assignment. (Right) Analogous plot for the adjacency matrix A. Here, formula image, formula image, and formula image. Applying k means gives an overlap 0.712 using B, but 0.0063 using A.
Fig. 6.
Fig. 6.
Spectrum of the nonbacktracking matrix in the complex plane for some real networks commonly used as benchmarks for community detection, taken from refs. –. The radius of the circle is the square root of the largest eigenvalue, which is a heuristic estimate of the bulk of the spectrum. The overlap is computed using the signs of the second eigenvector for the networks with two communities, and using k means for those with three and more communities. The nonbacktracking operator detects communities in all these networks, with an overlap comparable to the performance of other spectral methods. As in the case of synthetic networks generated by the stochastic block model, the number of real eigenvalues outside the bulk appears to be a good indicator of the number q of communities.

References

    1. Holland PW, Laskey KB, Leinhardt S. Stochastic blockmodels: First steps. Soc Networks. 1983;5(2):109–137.
    1. Wang YJ, Wong GY. Stochastic blockmodels for directed graphs. J Am Stat Assoc. 1987;82(397):8–19.
    1. Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17(4):395–416.
    1. Bickel PJ, Chen A. A nonparametric view of network models and Newman-Girvan and other modularities. Proc Natl Acad Sci USA. 2009;106(50):21068–21073. - PMC - PubMed
    1. Coja-Oghlan A, Mossel E, Vilenchik D. A spectral approach to analyzing belief propagation for 3-coloring. Combin Probab Comput. 2009;18(6):881–912.

Publication types