jmlr jmlr2013 jmlr2013-70 jmlr2013-70-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gang Niu, Bo Dai, Lin Shang, Masashi Sugiyama
Abstract: The large volume principle proposed by Vladimir Vapnik, which advocates that hypotheses lying in an equivalence class with a larger volume are more preferable, is a useful alternative to the large margin principle. In this paper, we introduce a new discriminative clustering model based on the large volume principle called maximum volume clustering (MVC), and then propose two approximation schemes to solve this MVC model: A soft-label MVC method using sequential quadratic programming and a hard-label MVC method using semi-definite programming, respectively. The proposed MVC is theoretically advantageous for three reasons. The optimization involved in hardlabel MVC is convex, and under mild conditions, the optimization involved in soft-label MVC is akin to a convex one in terms of the resulting clusters. Secondly, the soft-label MVC method pos∗. A preliminary and shorter version has appeared in Proceedings of 14th International Conference on Artificial Intelligence and Statistics (Niu et al., 2011). The preliminary work was done when GN was studying at Department of Computer Science and Technology, Nanjing University, and BD was studying at Institute of Automation, Chinese Academy of Sciences. A Matlab implementation of maximum volume clustering is available from http://sugiyama-www.cs.titech.ac.jp/∼gang/software.html. c 2013 Gang Niu, Bo Dai, Lin Shang and Masashi Sugiyama. N IU , DAI , S HANG AND S UGIYAMA sesses a clustering error bound. Thirdly, MVC includes the optimization problems of a spectral clustering, two relaxed k-means clustering and an information-maximization clustering as special limit cases when its regularization parameter goes to infinity. Experiments on several artificial and benchmark data sets demonstrate that the proposed MVC compares favorably with state-of-the-art clustering methods. Keywords: discriminative clustering, large volume principle, sequential quadratic programming, semi-definite programming, finite sample stability, clustering error
F. Agakov and D. Barber. Kernelized infomax clustering. In Advances in Neural Information Processing Systems 18 (NIPS), 2006. D. Arthur and S. Vassilvitskii. How slow is the k-means method? In Proceedings of 22nd ACM Symposium on Computational Geometry (SoCG), 2006. P. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems 14 (NIPS), 2002. S. Ben-David, D. P´ l, and H. U. Simon. Stability of k-means clustering. In Proceedings of 20th a Annual Conference on Learning Theory (COLT), 2007. P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta Numerica, 4:1–51, 1995. 2684 M AXIMUM VOLUME C LUSTERING B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of 5th Annual Workshop on Computational Learning Theory (COLT), 1992. O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. In O. Bousquet, U. von Luxburg, and G. R¨ tsch, editors, Advanced Lectures on Machine Learning, pages a 169–207. Springer, 2004. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2001. T. De Bie and N. Cristianini. Convex methods for transduction. In Advances in Neural Information Processing Systems 16 (NIPS), 2004. T. De Bie and N. Cristianini. Fast SDP relaxations of graph cut clustering, transduction, and other combinatorial problems. Journal of Machine Learning Research, 7:1409–1436, 2006. C. Ding and X. He. K-means clustering via principal component analysis. In Proceedings of 21st International Conference on Machine Learning (ICML), 2004. R. El-Yaniv and D. Pechyony. Transductive Rademacher complexity and its applications. Journal of Artificial Intelligence Research, 35:193–234, 2009. R. El-Yaniv, D. Pechyony, and V. Vapnik. Large margin vs. large volume in transductive learning. Machine Learning, 72(3):173–188, 2008. L. Faivishevsky and J. Goldberger. A nonparametric information theoretic clustering algorithm. In Proceedings of 27th International Conference on Machine Learning (ICML), 2010. M. Girolami. Mercer kernel-based clustering in feature space. IEEE Transactions on Neural Networks, 13(3):780–784, 2002. R. Gomes, A. Krause, and P. Perona. Discriminative clustering by regularized information maximization. In Advances in Neural Information Processing Systems 23 (NIPS), 2010. M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21 (web page and software). http://cvxr.com/cvx, 2011. J. A. Hartigan and M. A. Wong. A k-means clustering algorithm. Applied Statistics, 28:100–108, 1979. V. Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5):1902–1914, 2001. G. R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27–72, 2004. 2685 N IU , DAI , S HANG AND S UGIYAMA Y. Li, I. W. Tsang, J. T. Kwok, and Z.-H. Zhou. Tighter and convex maximum margin clustering. In Proceedings of 12th International Conference on Artificial Intelligence and Statistics (AISTATS), 2009. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967. M. Meila and J. Shi. A random walks view of spectral segmentation. In Proceedings of 8th International Workshop on Artificial Intelligence and Statistics (AISTATS), 2001. R. Meir and T. Zhang. Generalization error bounds for Bayesian mixture algorithms. Journal of Machine Learning Research, 4:839–860, 2003. A. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14 (NIPS), 2002. G. Niu, B. Dai, L. Shang, and M. Sugiyama. Maximum volume clustering. In Proceedings of 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011. P. Raghavan and C. Thompson. Randomized rounding: A technique for provably good algorithms and algorithmic proofs. Technical Report UCB/CSD-85-242, UC Berkeley, 1985. A. Rakhlin and A. Caponnetto. Stability of k-means clustering. In Advances in Neural Information Processing Systems 19 (NIPS), 2007. C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27: 379–423 & 623–656, 1948. H. Sherali and W. Adams. A Reformulation-Linearization Technique for Solving Discrete and Continuous Nonconvex Problems. Kluwer Academic Publishers, 1998. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000. A. Smola, S.V.N. Vishwanathan, and T. Hofmann. Kernel methods for missing variables. In Proceedings of 10th International Workshop on Artificial Intelligence and Statistics (AISTATS), 2005. L. Song, A. Smola, A. Gretton, and K. Borgwardt. A dependence maximization view of clustering. In Proceedings of 24th International Conference on Machine Learning (ICML), 2007. M. Sugiyama, M. Yamada, M. Kimura, and H. Hachiya. On information-maximization clustering: Tuning parameter selection and analytic solution. In Proceedings of 28th International Conference on Machine Learning (ICML), 2011. T. Suzuki, M. Sugiyama, J. Sese, and T. Kanamori. Approximating mutual information by maximum likelihood density ratio estimation. In JMLR Workshop and Conference Proceedings, volume 4, pages 5–20, 2008. T. Suzuki, M. Sugiyama, T. Kanamori, and J. Sese. Mutual information estimation reveals global associations between stimuli and biological processes. BMC Bioinformatics, 10(1):S52, 2009. 2686 M AXIMUM VOLUME C LUSTERING H. Valizadegan and R. Jin. Generalized maximum margin clustering and unsupervised kernel learning. In Advances in Neural Information Processing Systems 19 (NIPS), 2007. V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer Verlag, 1982. V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998. U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007. U. von Luxburg, O. Bousquet, and M. Belkin. Limits of spectral clustering. In Advances in Neural Information Processing Systems 17 (NIPS), 2005. U. von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. Annals of Statistics, 36(2):555–586, 2008. U. von Luxburg, R. Williamson, and I. Guyon. Clustering: Science or art? In JMLR Workshop and Conference Proceedings, volume 27, pages 65–80, 2012. F. Wang, B. Zhao, and C. Zhang. Linear time maximum margin clustering. IEEE Transactions on Neural Networks, 21(2):319–332, 2010. L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class support vector machines. In Proceedings of 20th National Conference on Artificial Intelligence (AAAI), 2005. L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. In Advances in Neural Information Processing Systems 17 (NIPS), 2005. A. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15(4):915–936, 2003. L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In Advances in Neural Information Processing Systems 17 (NIPS), 2005. H. Zha, X. He, C. Ding, M. Gu, and H. Simon. Spectral relaxation for k-means clustering. In Advances in Neural Information Processing Systems 14 (NIPS), 2002. K. Zhang, I. W. Tsang, and J. T. Kwok. Maximum margin clustering made practical. In Proceedings of 24th International Conference on Machine Learning (ICML), 2007. B. Zhao, F. Wang, and C. Zhang. Efficient multiclass maximum margin clustering. In Proceedings of 25th International Conference on Machine Learning (ICML), 2008a. B. Zhao, F. Wang, and C. Zhang. Efficient maximum margin clustering via cutting plane algorithm. In Proceedings of 8th SIAM International Conference on Data Mining (SDM), 2008b. 2687