nips nips2008 nips2008-193 nips2008-193-reference knowledge-graph by maker-knowledge-mining

193 nips-2008-Regularized Co-Clustering with Dual Supervision

Source: pdf

Author: Vikas Sindhwani, Jianying Hu, Aleksandra Mojsilovic

Abstract: By attempting to simultaneously partition both the rows (examples) and columns (features) of a data matrix, Co-clustering algorithms often demonstrate surprisingly impressive performance improvements over traditional one-sided row clustering techniques. A good clustering of features may be seen as a combinatorial transformation of the data matrix, effectively enforcing a form of regularization that may lead to a better clustering of examples (and vice-versa). In many applications, partial supervision in the form of a few row labels as well as column labels may be available to potentially assist co-clustering. In this paper, we develop two novel semi-supervised multi-class classiﬁcation algorithms motivated respectively by spectral bipartite graph partitioning and matrix approximation formulations for co-clustering. These algorithms (i) support dual supervision in the form of labels for both examples and/or features, (ii) provide principled predictive capability on out-of-sample test data, and (iii) arise naturally from the classical Representer theorem applied to regularization problems posed on a collection of Reproducing Kernel Hilbert Spaces. Empirical results demonstrate the effectiveness and utility of our algorithms. 1

reference text

[1] A. Banerjee, I. Dhillon, J. Ghosh, S.Merugu, and D.S. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. JMLR, 8:1919–1986, 2007.

[2] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR, 7:2399–2434, 2006.

[3] O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, 2006. o

[4] F. Chung, editor. Spectral Graph Theory. AMS, 1997.

[5] I. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD, 2001.

[6] C. Ding, X. He, and H.D. Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In SDM, 2005.

[7] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix tri-factorizations for clustering. In KDD, 2006.

[8] G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In SIGIR, 2008.

[9] J. Gardiner, Laub A.J, Amato J.J, and Moler C.B. Solution of the Sylvester matrix equation AXBT + CXDT = E. ACM Transactions on Mathematical Software, 18(2):223–231, 1992.

[10] D. Harville. Matrix Algebra From a Statistician’s Perspective. Springer, New York, 1997.

[11] T.M. Huang and V. Kecman. Semi-supervised learning from unbalanced labeled data an improvement. Lecture Notes in Computer Science, 3215:765–771, 2004.

[12] A. Langville, C. Meyer, and R. Albright. Initializations for the non-negative matrix factorization. In KDD, 2006.

[13] T. Li and C. Ding. The relationships among various nonnegative matrix factorization methods for clustering. In ICDM, 2006.

[14] V. Sindhwani and P. Melville. Document-word co-regularization for semi-supervised sentiment analysis. In ICDM, 2008.

[15] N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In SIGIR, 2000. 8