nips nips2010 nips2010-62 nips2010-62-reference knowledge-graph by maker-knowledge-mining

62 nips-2010-Discriminative Clustering by Regularized Information Maximization

Source: pdf

Author: Andreas Krause, Pietro Perona, Ryan G. Gomes

Abstract: Is there a principled way to learn a probabilistic discriminative classiﬁer from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classiﬁer. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information-theoretic objective function which balances class separation, class balance and classiﬁer complexity. The approach can ﬂexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semi-supervised learning. In particular, we instantiate the framework to unsupervised, multi-class kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method. 1

reference text

[1] A. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.

[2] L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class support vector machines. In AAAI, 2005.

[3] Y. Grandvalet and Y. Bengio. Semi-supervised learning by entropy minimization. In NIPS, 2004.

[4] John S. Bridle, Anthony J. R. Heading, and David J. C. MacKay. Unsupervised classiﬁers, mutual information and ‘phantom targets’. In John E. Moody, Steve J. Hanson, and Richard P. Lippmann, editors, Advances in Neural Information Processing Systems, volume 4, pages 1096–1101. Morgan Kaufmann Publishers, Inc., 1992.

[5] Olivier Chapelle and Alexander Zien. Semi-supervised classiﬁcation by low density separation, September 2004.

[6] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45:503–528, 1989.

[7] T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. In NIPS, 1999.

[8] Y. W. Teh. A hierarchical bayesian language model based on pitman-yor processes. In ACL, 2006.

[9] K. Zhang, I. W. Tsang, and J. T. Kwok. Maximum margin clustering made practical. In ICML, 2007.

[10] John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, New York, NY, USA, 2004.

[11] Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classiﬁcation, 2:193– 218, 1985.

[12] Alexander Strehl and Joydeep Ghosh. Cluster ensembles — A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583–617, 2002.

[13] Y. Chen, J. Ze Wang, and R. Krovetz. CLUE: cluster-based retrieval of images by unsupervised learning. IEEE Trans. Image Processing, 14(8):1187–1201, 2005.

[14] G. Grifﬁn, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.

[15] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.

[16] P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol., 330:771–783, Jul 2003.

[17] Nikil Wale and George Karypis. Comparison of descriptor spaces for chemical compound retrieval and classiﬁcation. In ICDM, pages 678–689, 2006.

[18] N. Shervashidze and K. M. Borgwardt. Fast subtree kernels on graphs. In NIPS, 2010.

[19] N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. CoRR, physics/0004057, 2000.

[20] N. Slonim, G. S. Atwal, G. Tkacik, and W. Bialek. Information-based clustering. Proc Natl Acad Sci U S A, 102(51):18297–18302, December 2005.

[21] Francis Bach and Za¨d Harchaoui. DIFFRAC: a discriminative and ﬂexible framework for ı clustering. In John C. Platt, Daphne Koller, Yoram Singer, and Sam T. Roweis, editors, NIPS. MIT Press, 2007.

[22] Le Song, Alex Smola, Arthur Gretton, and Karsten M. Borgwardt. A dependence maximization view of clustering. In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages 815–822, New York, NY, USA, 2007. ACM.

[23] A. Corduneanu and T. Jaakkola. On information regularization. In UAI, 2003. 9