nips nips2007 nips2007-156 nips2007-156-reference knowledge-graph by maker-knowledge-mining

156 nips-2007-Predictive Matrix-Variate t Models

Source: pdf

Author: Shenghuo Zhu, Kai Yu, Yihong Gong

Abstract: It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements. We assume that the entire matrix is a single sample drawn from a matrix-variate t distribution and suggest a matrixvariate t model (MVTM) to predict those missing elements. We show that MVTM generalizes a range of known probabilistic models, and automatically performs model selection to encourage sparse predictive models. Due to the non-conjugacy of its prior, it is difﬁcult to make predictions by computing the mode or mean of the posterior distribution. We suggest an optimization method that sequentially minimizes a convex upper-bound of the log-likelihood, which is very efﬁcient and scalable. The experiments on a toy data and EachMovie dataset show a good predictive accuracy of the model. 1

reference text

[1] C. Archambeau, N. Delannay, and M. Verleysen. Robust probabilistic projections. In ICML, 2006.

[2] J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative ﬁltering. In UAI-98, pages 43–52, 1998.

[3] M. Fazel, H. Haitham, and S. P. Boyd. Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In Proceedings of the American Control Conference, 2003.

[4] C. Fernandez and M. F. J. Steel. Multivariate Student-t regression models: Pitfalls and inference. Biometrika, 86(1):153–167, 1999.

[5] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC, New York, 2nd edition, 2004.

[6] A. K. Gupta and D. K. Nagar. Matrix Variate Distributions. Chapman & Hall/CRC, 2000.

[7] N. Lawrence. Probabilistic non-linear principal component analysis with gaussian process latent variable models. J. Mach. Learn. Res., 6:1783–1816, 2005.

[8] D. J. C. MacKay. Comparison of approximate methods for handling hyperparameters. Neural Comput., 11(5):1035–1068, 1999.

[9] J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In ICML, 2005.

[10] M. E. Tipping. Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001.

[11] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statisitical Scoiety, B(61):611–622, 1999.

[12] K. Yu, W. Chu, S. Yu, V. Tresp, and Z. Xu. Stochastic relational models for discriminative link prediction. In Advances in Neural Information Processing Systems 19 (NIPS), 2006.

[13] K. Yu, V. Tresp, and A. Schwaighofer. Learning Gaussian processes from multiple tasks. In ICML, 2005.