nips nips2011 nips2011-134 nips2011-134-reference knowledge-graph by maker-knowledge-mining

134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning

Source: pdf

Author: Jun Zhu, Ning Chen, Eric P. Xing

Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing inﬁnite latent support vector machines (iLSVM) and multi-task inﬁnite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classiﬁcation and multi-task learning, respectively. We present efﬁcient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.

reference text

[1] R. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, (6):1817–1853, 2005.

[2] C.E. Antoniak. Mixture of Dirichlet process with applications to Bayesian nonparametric problems. Annals of Stats, (273):1152–1174, 1974.

[3] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. In NIPS, 2007.

[4] B. Bakker and T. Heskes. Task clustering and gating for Bayesian multitask learning. JMLR, (4):83–99, 2003.

[5] M.J. Beal, Z. Ghahramani, and C.E. Rasmussen. The inﬁnite hidden Markov model. In NIPS, 2002.

[6] K. Bellare, G. Druck, and A. McCallum. Alternating projections for learning with expectation constraints. In UAI, 2009.

[7] E. Bonilla, K.M.A. Chai, and C. Williams. Multi-task Gaussian process prediction. In NIPS, 2008.

[8] N. Chen, J. Zhu, and E.P. Xing. Predictive subspace learning for multiview data: a large margin approach. In NIPS, 2010.

[9] F. Doshi-Velez, K. Miller, J. Van Gael, and Y.W. Teh. Variational inference for the Indian buffet process. In AISTATS, 2009.

[10] D. Dunson and S. Peddada. Bayesian nonparametric inferences on stochastic ordering. ISDS Discussion Paper, 2, 2007.

[11] K. Ganchev, J. Graca, J. Gillenwater, and B. Taskar. Posterior regularization for structured latent variable models. JMLR, (11):2001–2094, 2010.

[12] T.L. Grifﬁths and Z. Ghahramani. Inﬁnite latent feature models and the Indian buffet process. In NIPS, 2006.

[13] D. Hoff. Bayesian methods for partial stochastic orderings. Biometrika, 90:303–317, 2003.

[14] S. Huh and S. Fienberg. Discriminative topic modeling based on manifold learning. In KDD, 2010.

[15] T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. In NIPS, 1999.

[16] T. Jebara. Multitask sparsity via maximum entropy discrimination. JMLR, (12):75–110, 2011.

[17] T. Joachims. Transductive inference for text classiﬁcation using support vector machines. In ICML, 1999.

[18] M. E. Khan, B. Marlin, G. Bouchard, and K. Murphy. Variational bounds for mixed-data factor analysis. In NIPS, 2010.

[19] P. Liang, M. Jordan, and D. Klein. Learning from measurements in exponential families. In ICML, 2009.

[20] S.N. MacEachern. Dependent nonparametric process. In the Section on Bayesian Statistical Science of ASA, 1999.

[21] G. Mann and A. McCallum. Generalized expectation criteria for semi-supervised learning with weakly labeled data. JMLR, (11):955–984, 2010.

[22] K. Miller, T. Grifﬁths, and M. Jordan. Nonparametric latent feature models for link prediction. In NIPS, 2009.

[23] P. Rai and H. Daume III. Inﬁnite predictor subspace models for multitask learning. In AISTATS, 2010.

[24] C.E. Rasmussen and Z. Ghahramani. Inﬁnite mixtures of Gaussian process experts. In NIPS, 2002.

[25] Y.W. Teh, D. Gorur, and Z. Ghahramani. Stick-breaking construction of the Indian buffet process. In AISTATS, 2007.

[26] Y.W. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet process. JASA, 101(476):1566–1581, 2006.

[27] M. Welling, M. Rosen-Zvi, and G. Hinton. Exponential family harmoniums with an application to information retrieval. In NIPS, 2004.

[28] Y. Xue, D. Dunson, and L. Carin. The matrix stick-breaking process for ﬂexible multi-task learning. In ICML, 2007.

[29] A. Zellner. Optimal information processing and Bayes’ theorem. American Statistician, 42:278–280, 1988.

[30] Y. Zhang and D.Y. Yeung. A convex formulation for learning task relationships in multi-task learning. In UAI, 2010.

[31] J. Zhu, A. Ahmed, and E.P. Xing. MedLDA: Maximum margin supervised topic models for regression and classiﬁcation. In ICML, 2009.

[32] J. Zhu, N. Chen, and E.P. Xing. Inﬁnite SVM: a Dirichlet process mixture of large-margin kernel machines. In ICML, 2011. 9