nips nips2012 nips2012-268 nips2012-268-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shinichi Nakajima, Ryota Tomioka, Masashi Sugiyama, S. D. Babacan
Abstract: The variational Bayesian (VB) approach is one of the best tractable approximations to the Bayesian estimation, and it was demonstrated to perform well in many applications. However, its good performance was not fully understood theoretically. For example, VB sometimes produces a sparse solution, which is regarded as a practical advantage of VB, but such sparsity is hardly observed in the rigorous Bayesian estimation. In this paper, we focus on probabilistic PCA and give more theoretical insight into the empirical success of VB. More specifically, for the situation where the noise variance is unknown, we derive a sufficient condition for perfect recovery of the true PCA dimensionality in the large-scale limit when the size of an observed matrix goes to infinity. In our analysis, we obtain bounds for a noise variance estimator and simple closed-form solutions for other parameters, which themselves are actually very useful for better implementation of VB-PCA. 1
[1] H. Attias. Inferring parameters and structure of latent variable models by variational Bayes. In Proceedings of the Fifteenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 21–30, San Francisco, CA, 1999. Morgan Kaufmann.
[2] J. Baik and J. W. Silverstein. Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis, 97(6):1382–1408, 2006.
[3] C. M. Bishop. Variational principal components. In Proc. of ICANN, volume 1, pages 514–509, 1999.
[4] Z. Ghahramani and M. J. Beal. Graphical models and variational methods. In Advanced Mean Field Methods, pages 161–177. MIT Press, 2001.
[5] D. C. Hoyle. Automatic PCA dimension selection for high dimensional data and small sample sizes. Journal of Machine Learning Research, 9:2733–2759, 2008.
[6] A. Ilin and T. Raiko. Practical approaches to principal component analysis in the presence of missing values. JMLR, 11:1957–2000, 2010.
[7] T. S. Jaakkola and M. I. Jordan. Bayesian parameter estimation via variational methods. Statistics and Computing, 10:25–37, 2000.
[8] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, 29:295–327, 2001.
[9] N. El Karoui. Spectrum estimation for large dimensional covariance matrices using random matrix theory. Annals of Statistics, 36(6):2757–2790, 2008.
[10] O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2):365–411, 2004.
[11] Y. J. Lim and T. W. Teh. Variational Bayesian approach to movie rating prediction. In Proceedings of KDD Cup and Workshop, 2007.
[12] D. J. C. Mackay. Local minima, symmetry-breaking, and model pruning in variational free energy minimization. Available from http://www.inference.phy.cam.ac.uk/ mackay/minima.pdf. 2001.
[13] V. A. Marcenko and L. A. Pastur. Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1(4):457–483, 1967.
[14] T. P. Minka. Automatic choice of dimensionality for PCA. In Advances in NIPS, volume 13, pages 598–604. MIT Press, 2001.
[15] S. Nakajima and M. Sugiyama. Theoretical analysis of Bayesian matrix factorization. Journal of Machine Learning Research, 12:2579–2644, 2011.
[16] S. Nakajima, M. Sugiyama, and S. D. Babacan. Global solution of fully-observed variational Bayesian matrix factorization is column-wise independent. In Advances in Neural Information Processing Systems 24, 2011.
[17] S. Nakajima, M. Sugiyama, and S. D. Babacan. On Bayesian PCA: Automatic dimensionality selection and analytic solution. In Proceedings of 28th International Conference on Machine Learning (ICML2011), Bellevue, WA, USA, Jun. 28–Jul.2 2011.
[18] S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Computation, 11:305–345, 1999.
[19] R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1257–1264, Cambridge, MA, 2008. MIT Press.
[20] M. Sato, T. Yoshioka, S. Kajihara, K. Toyama, N. Goda, K. Doya, and M. Kawato. Hierarchical Bayesian estimation for MEG inverse problem. Neuro Image, 23:806–826, 2004.
[21] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, 61:611–622, 1999.
[22] K. W. Wachter. The strong limits of random matrix spectra for sample matrices of independent elements. Annals of Probability, 6:1–18, 1978. 9