nips nips2008 nips2008-227 nips2008-227-reference knowledge-graph by maker-knowledge-mining

227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

Source: pdf

Author: Yuhong Guo

Abstract: Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential family PCA, which is able to avoid the local optima of typical EM learning. Moreover, by introducing a sample-based approximation to exponential family models, it overcomes the limitation of the prevailing Gaussian assumptions of standard PCA, and produces a kernelized formulation for nonlinear supervised dimensionality reduction. A training algorithm is then devised based on a subgradient bundle method, whose scalability can be gained using a coordinate descent procedure. The advantage of our global optimization approach is demonstrated by empirical results over both synthetic and real data. 1

reference text

[1] A. Belloni. Introduction to bundle methods. Technical report, MIT, 2005.

[2] J. Borwein and A. Lewis. Convex Analysis and Nonlinear Optimization. Springer, 2000.

[3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge U. Press, 2004.

[4] M. Collins, S. Dasgupta, and R. Schapire. A generalization of principal component analysis to the exponential family. In Advances in Neural Information Processing Systems (NIPS), 2001.

[5] R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188, 1936.

[6] Y. Guo and D. Schuurmans. Convex relaxations of latent variable training. In Advances in Neural Information Processing Systems (NIPS), 2007.

[7] Y. Guo and D. Schuurmans. Efﬁcient global optimization for exponential family PCA and low-rank matrix factorization. In Allerton Conf. on Commun., Control, and Computing, 2008.

[8] I. Jolliffe. Principal Component Analysis. Springer Verlag, 2002.

[9] N. Lawrence. Probabilistic non-linear principle component analysis with gaussian process latent variable models. Journal of Machine Learning Research, 6:1783–1816, 2005.

[10] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Muller. Fisher discriminant analysis with kernels. In IEEE Neural Networks for Signal Processing Workshop, 1999.

[11] M. Overton and R. Womersley. Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Prog., 62:321–357, 1993.

[12] I. Rish, G. Grabarnilk, G. Cecchi, F. Pereira, and G. Gordon. Closed-form supervised dimensionality reduction with generalized linear models. In Proceedings of International Conference on Machine Learning (ICML), 2008.

[13] Sajama and A. Orlitsky. Semi-parametric exponential family PCA. In Advances in Neural Information Processing Systems (NIPS), 2004.

[14] Sajama and A. Orlitsky. Supervised dimensionality reduction using mixture models. In Proceedings of the International Conference on Machine Learning (ICML), 2005.

[15] L. Song, A. Smola, K. Borgwardt, and A. Gretton. Colored maximum variance unfolding. In Advances in Neural Information Processing Systems (NIPS), 2007.

[16] M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, B, 6(3):611–622, 1999.

[17] P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109:457–494, 2001.

[18] M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Technical Report TR-649, UC Berkeley, Dept. Statistics, 2003.

[19] S. Yu, K. Yu, V. Tresp, H. Kriegel, and M. Wu. Supervised probabilistic principal component analysis. In Proceedings of 12th ACM SIGKDD International Conf. on KDD, 2006.