nips nips2005 nips2005-84 nips2005-84-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Eyal Krupka, Naftali Tishby
Abstract: We argue that when objects are characterized by many attributes, clustering them on the basis of a relatively small random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove a finite sample generalization theorems for this novel learning scheme that extends analogous results from the supervised learning setting. The scheme is demonstrated for collaborative filtering of users with movies rating as attributes. 1
[1] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264–323, September 1999.
[2] T. M. Cover and J. A. Thomas. Elements Of Information Theory. Wiley Interscience, 1991.
[3] V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.
[4] N. Tishby, F. Pereira, and W. Bialek. The information bottleneck method. Proc. 37th Allerton Conf. on Communication and Computation, 1999.
[5] M. Seeger. Learning with labeled and unlabeled data. Technical report, University of Edinburgh, 2002.
[6] M. Szummer and T. Jaakkola. Information regularization with partially labeled data. In NIPS, 2003.
[7] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963.
[8] E. Krupka and N. Tishby. Generalization in clustering with unobserved features. Technical report, Hebrew University, 2005. http://www.cs.huji.ac.il/~tishby/nips2005tr.pdf.
[9] L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15:1101– 1253, 2003.
[10] B. Marlin. Collaborative filtering: A machine learning perspective. Master’s thesis, University of Toronto, 2004. 1 Chinaberries are the fruits of the Melia azedarach tree, and are poisonous.