nips nips2008 nips2008-193 nips2008-193-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vikas Sindhwani, Jianying Hu, Aleksandra Mojsilovic
Abstract: By attempting to simultaneously partition both the rows (examples) and columns (features) of a data matrix, Co-clustering algorithms often demonstrate surprisingly impressive performance improvements over traditional one-sided row clustering techniques. A good clustering of features may be seen as a combinatorial transformation of the data matrix, effectively enforcing a form of regularization that may lead to a better clustering of examples (and vice-versa). In many applications, partial supervision in the form of a few row labels as well as column labels may be available to potentially assist co-clustering. In this paper, we develop two novel semi-supervised multi-class classification algorithms motivated respectively by spectral bipartite graph partitioning and matrix approximation formulations for co-clustering. These algorithms (i) support dual supervision in the form of labels for both examples and/or features, (ii) provide principled predictive capability on out-of-sample test data, and (iii) arise naturally from the classical Representer theorem applied to regularization problems posed on a collection of Reproducing Kernel Hilbert Spaces. Empirical results demonstrate the effectiveness and utility of our algorithms. 1
[1] A. Banerjee, I. Dhillon, J. Ghosh, S.Merugu, and D.S. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. JMLR, 8:1919–1986, 2007.
[2] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR, 7:2399–2434, 2006.
[3] O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, 2006. o
[4] F. Chung, editor. Spectral Graph Theory. AMS, 1997.
[5] I. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD, 2001.
[6] C. Ding, X. He, and H.D. Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In SDM, 2005.
[7] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix tri-factorizations for clustering. In KDD, 2006.
[8] G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In SIGIR, 2008.
[9] J. Gardiner, Laub A.J, Amato J.J, and Moler C.B. Solution of the Sylvester matrix equation AXBT + CXDT = E. ACM Transactions on Mathematical Software, 18(2):223–231, 1992.
[10] D. Harville. Matrix Algebra From a Statistician’s Perspective. Springer, New York, 1997.
[11] T.M. Huang and V. Kecman. Semi-supervised learning from unbalanced labeled data an improvement. Lecture Notes in Computer Science, 3215:765–771, 2004.
[12] A. Langville, C. Meyer, and R. Albright. Initializations for the non-negative matrix factorization. In KDD, 2006.
[13] T. Li and C. Ding. The relationships among various nonnegative matrix factorization methods for clustering. In ICDM, 2006.
[14] V. Sindhwani and P. Melville. Document-word co-regularization for semi-supervised sentiment analysis. In ICDM, 2008.
[15] N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In SIGIR, 2000. 8