nips nips2006 nips2006-70 nips2006-70-reference knowledge-graph by maker-knowledge-mining

70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

Source: pdf

Author: Ron Zass, Amnon Shashua

Abstract: In this paper we focus on the issue of normalization of the afﬁnity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in ﬁnding the closest doubly-stochastic matrix to the input afﬁnity matrix. We then develop a scheme for ﬁnding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efﬁcient and provides superior clustering performance over many of the standardized tests. 1

reference text

[1] P. K. Chan, M. D. F. Schlag, and J. Y. Zien. Spectral k-way ratio-cut partitioning and clustering. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 13(9):1088– 1096, 1994.

[2] I. Csiszar. I-divergence geometry of probability distributions and minimization problems. The Annals of Probability, 3(1):146–158, 1975.

[3] R.L. Dykstra. An algorithm for restricted least squares regression. J. of the Amer. Stat. Assoc., 78:837–842, 1983.

[4] I.S.Dhillon, Y.Guan, and B.Kulis. Kernel k-means, spectral clustering and normalized cuts. In International Conference on Knowledge Discovery and Data Mining(KDD), pages 551–556, Aug. 2004.

[5] J. Von Neumann. Functional Operators Vol. II. Princeton University Press, 1950.

[6] A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings of the conference on Neural Information Processing Systems (NIPS), 2001.

[7] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 2000.

[8] R. Sinkhorn and P. Knopp. Conerning non-negative matrices and doubly stochastic matrices. Paciﬁc J. Math., 21:343–348, 1967.

[9] Y. Weiss. Segmentation using eigenvectors: a unifying view. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1999.

[10] S.X. Yu and J. Shi. Multiclass spectral clustering. In Proceedings of the International Conference on Computer Vision, 2003.

[11] R. Zass and A. Shashua. A unifying approach to hard and probabilistic clustering. In Proceedings of the International Conference on Computer Vision, Beijing, China, Oct. 2005. A Normalized Cuts and Relative Entropy Normalization The following proposition is an extension (symmetric version) of the claim about the iterative proportional ﬁtting procedure converging in relative entropy error measure [2]: Proposition 2 The closest doubly-stochastic matrix F under the relative-entropy error measure to a given symmetric matrix K, i.e., which minimizes: min RE(F ||K) s.t. F ≥ 0, F = F , F 1 = 1, F 1 = 1 F has the form F = DKD for some (unique) diagonal matrix D. Proof: The Lagrangian of the problem is: fij L() = fij ln + kij − fij − kij ij ij ij fij − 1) − λi ( i j The derivative with respect to fij is: ∂L = ln fij + 1 − ln kij − 1 − λi − µj = 0 ∂fij from which we obtain: fij = eλi eµj kij Let D1 = diag(eλ1 , ..., eλn ) and D2 = diag(eµ1 , ..., eµn ), then we have: F = D1 KD2 Since F = F and K is symmetric we must have D1 = D2 . fij − 1) µj ( j i