nips nips2009 nips2009-213 nips2009-213-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kaushik Sinha, Mikhail Belkin
Abstract: We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices. It turns out that when the data has clustered, that is, when the high density regions are sufficiently separated by low density valleys, each high density area corresponds to a unique representative eigenvector. Linear combination of such eigenvectors (or, more precisely, of their Nystrom extensions) provide good candidates for good classification functions when the cluster assumption holds. By first choosing an appropriate basis of these eigenvectors from unlabeled data and then using labeled data with Lasso to select a classifier in the span of these eigenvectors, we obtain a classifier, which has a very sparse representation in this basis. Importantly, the sparsity corresponds naturally to the cluster assumption. Experimental results on a number of real-world data-sets show that our method is competitive with the state of the art semi-supervised learning algorithms and outperforms the natural base-line algorithm (Lasso in the Kernel PCA basis). 1
[BNS06] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. Journal of Machine Learning Research, 7:2399–2434, 2006. [BPV03] Y. Bengio, J-F. Paiement, and P. Vincent. Out-of-sample Extensions for LLE, Isomap, MDS, Eigenmaps and Spectral Clustering. In NIPS. 2003. [CC96] V. Castelli and T. M. Cover. The Relative Value of Labeled and Unlabeled Samples in Pattern Recognition with Unknown Mixing Parameters. IEEE Transactions on Information Theory, 42(6):2102–2117, 1996. [CP07] E. J. Candes and Y. Plan. Near Ideal Model Selection by arxiv:0801.0345. 2007. 1 Minimization, eprint [CWS02] O. Chapelle, J. Weston, and B. Scholkopf. Cluster Kernels for Semi-supervised Learning. In NIPS. 2002. [Das99] S. Dasgupta. Learning Mixture of Gaussians. In 40th Annual Symposium on Foundations of Computer Science, 1999. [HTF03] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning Data Mining, Inference and Prediction. Springer, 2003. [RBV08] L. Rosasco, M. Belkin, and E. De Vito. Perturbation Results for Learning Empirical Opertors. Technical Report TR-2008-052, Massachusetts Institute of Technology, Cambridge, MA, August 2008. [RV95] J. Ratsaby and S. Venkatesh. Learning From a Mixture of Labeled and Unlabeled Examples with Parametric Side Information. In COLT. 1995. [SB07] K. Sinha and M. Belkin. The Value of Labeled and Unlabeled Examples when the Model is Imperfect. In NIPS. 2007. [SBY08a] T. Shi, M. Belkin, and B. Yu. Data Spectroscopy: Eigenspace of Convolution Operators and Clustering. Technical report, Dept. of Statistics, Ohio State University, 2008. [SBY08b] T. Shi, M. Belkin, and B. Yu. Data Spectroscopy: Learning Mixture Models using Eigenspaces of Convolution Operators. In ICML. 2008. [SNZ08] A. Singh, R. D. Nowak, and X. Zhu. Unlabeled Data: Now it Helps Now it Doesn’t. In NIPS. 2008. [SSM98] Bernhard Scholkopf, A. Smola, and Klaus-Robert Muller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10:1299–1319, 1998. [Tib96] R. Tibshirani. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1996. [Tro04] J. A. Tropp. Greed is Good: Algorithmic Result for Sparse Approximation. IEEE Trans. Info. Theory, 50(10):2231–2242, 2004. [Wai06] M. Wainwright. Sharp Thresholds for Noisy and High-dimensional Recovery of Sparsity using 1 -constrained Quadratic Programming. Technical Report TR-709, Dept. of Statistics, U. C. Berkeley, September 2006. [ZH06] C. Zhang and J. Huang. Model Selection Consistency of Lasso in High Dimensional Linear Regression. Technical report, Dept. of Statistics, Rutgers University, 2006. [Zha08] T. Zhang. On consistency of feature selection using greedy least square regression. Journal of Machine Learning Research, 2008. [ZY06] P. Zhao and B. Yu. On Model Selection Consistency of Lasso. Journal of Machine Learning Research, 7:2541–2563, 2006. 9