nips nips2010 nips2010-280 nips2010-280-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Meihong Wang, Fei Sha, Michael I. Jordan
Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.
[1] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323, 2000.
[2] J. B. Tenenbaum, V. Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319, 2000.
[3] C. M. Bishop, M. Svens´ n, and C. K. I. Williams. GTM: the generative topographic mapping. Neural e Computation, 10:215–234, 1998. 8
[4] N. D. Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. In Advances in Neural Information Processing Systems 16, pages 329–336. MIT Press, 2004.
[5] R. D. Cook and X. Yin. Dimension reduction and visualization in discriminant analysis (with discussion). Australian & New Zealand Journal of Statistics, 43:147–199, 2001.
[6] K. C. Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86:316–327, 1991.
[7] K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimension reduction in regression. The Annals of Statistics, 37:1871–1905, 2009.
[8] J. Nilsson, F. Sha, and M. I. Jordan. Regression on manifolds using kernel dimension reduction. In Proceedings of the 24th International Conference on Machine Learning, pages 697–704. ACM, 2007.
[9] K.-C. Li. On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. Journal of the American Statistical Association, 86:316–342, 1992.
[10] A. Shyr, R. Urtasun, and M. I. Jordan. Sufficient dimensionality reduction for visual sequence classification. In Proceedings of Twenty-third IEEE Conference on Computer Vision and Pattern Recognition, pages 3610–3617, 2010.
[11] Q. Wu, S. Mukherjee, and F. Liang. Localized sliced inverse regression. In Advances in Neural Information Processing Systems 21, pages 1785–1792. MIT Press, 2009.
[12] C. M. Bishop et al. Pattern recognition and machine learning. Springer New York, 2006.
[13] N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing, pages 368–377, 1999.
[14] G. Hinton and S. Roweis. Stochastic neighbor embedding. Advances in Neural Information Processing Systems 15, pages 857–864, 2003.
[15] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. The Journal of Machine Learning Research, 9:2579–2605, 2008.
[16] A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Sch¨ lkopf. Kernel methods for measuring o independence. The Journal of Machine Learning Research, 6:2075–2129, 2005.
[17] F. R. Bach and M. I. Jordan. Kernel independent component analysis. The Journal of Machine Learning Research, 3:1–48, 2003.
[18] L. Song, A. Smola, A. Gretton, and K. M. Borgwardt. A dependence maximization view of clustering. In Proceedings of the 24th International Conference on Machine Learning, pages 815–822. ACM, 2007.
[19] L. Song, A. Smola, A. Gretton, K. M. Borgwardt, and J. Bedo. Supervised feature selection via dependence estimation. In Proceedings of the 24th International Conference on Machine Learning, pages 823–830. ACM, 2007.
[20] L. Song, A. Smola, K. Borgwardt, and A. Gretton. Colored maximum variance unfolding. Advances in Neural Information Processing Systems 20, pages 1385–1392, 2008.
[21] K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. The Journal of Machine Learning Research, 5:73–99, 2004.
[22] K. P. Adragni and R. D. Cook. Sufficient dimension reduction and prediction in regression. Philosophical Transactions A, 367:4385–4405, 2009.
[23] N., J. Kandola, A. Elisseeff, and J. Shawe-Taylor. On kernel-target alignment. In Advances in Neural Information Processing Systems 14, pages 367–373. MIT Press, 2002.
[24] C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10:1203–1216, 1998.
[25] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20, pages 1177–1184. MIT Press, 2008.
[26] Y. Cho and L. Saul. Kernel methods for deep learning. In Advances in Neural Information Processing Systems 22, pages 342–350. MIT Press, 2009.
[27] C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Springer Verlag, 1984.
[28] A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl, 20:303–353, 1998.
[29] B. Nadler, S. Lafon, R. Coifman, and I. G. Kevrekidis. Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators. In Advances in Neural Information Processing Systems 18, pages 955–962. MIT Press, 2005.
[30] K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 10:207–244, 2009. 9