nips nips2011 nips2011-134 nips2011-134-reference knowledge-graph by maker-knowledge-mining

134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning


Source: pdf

Author: Jun Zhu, Ning Chen, Eric P. Xing

Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.


reference text

[1] R. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, (6):1817–1853, 2005.

[2] C.E. Antoniak. Mixture of Dirichlet process with applications to Bayesian nonparametric problems. Annals of Stats, (273):1152–1174, 1974.

[3] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. In NIPS, 2007.

[4] B. Bakker and T. Heskes. Task clustering and gating for Bayesian multitask learning. JMLR, (4):83–99, 2003.

[5] M.J. Beal, Z. Ghahramani, and C.E. Rasmussen. The infinite hidden Markov model. In NIPS, 2002.

[6] K. Bellare, G. Druck, and A. McCallum. Alternating projections for learning with expectation constraints. In UAI, 2009.

[7] E. Bonilla, K.M.A. Chai, and C. Williams. Multi-task Gaussian process prediction. In NIPS, 2008.

[8] N. Chen, J. Zhu, and E.P. Xing. Predictive subspace learning for multiview data: a large margin approach. In NIPS, 2010.

[9] F. Doshi-Velez, K. Miller, J. Van Gael, and Y.W. Teh. Variational inference for the Indian buffet process. In AISTATS, 2009.

[10] D. Dunson and S. Peddada. Bayesian nonparametric inferences on stochastic ordering. ISDS Discussion Paper, 2, 2007.

[11] K. Ganchev, J. Graca, J. Gillenwater, and B. Taskar. Posterior regularization for structured latent variable models. JMLR, (11):2001–2094, 2010.

[12] T.L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian buffet process. In NIPS, 2006.

[13] D. Hoff. Bayesian methods for partial stochastic orderings. Biometrika, 90:303–317, 2003.

[14] S. Huh and S. Fienberg. Discriminative topic modeling based on manifold learning. In KDD, 2010.

[15] T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. In NIPS, 1999.

[16] T. Jebara. Multitask sparsity via maximum entropy discrimination. JMLR, (12):75–110, 2011.

[17] T. Joachims. Transductive inference for text classification using support vector machines. In ICML, 1999.

[18] M. E. Khan, B. Marlin, G. Bouchard, and K. Murphy. Variational bounds for mixed-data factor analysis. In NIPS, 2010.

[19] P. Liang, M. Jordan, and D. Klein. Learning from measurements in exponential families. In ICML, 2009.

[20] S.N. MacEachern. Dependent nonparametric process. In the Section on Bayesian Statistical Science of ASA, 1999.

[21] G. Mann and A. McCallum. Generalized expectation criteria for semi-supervised learning with weakly labeled data. JMLR, (11):955–984, 2010.

[22] K. Miller, T. Griffiths, and M. Jordan. Nonparametric latent feature models for link prediction. In NIPS, 2009.

[23] P. Rai and H. Daume III. Infinite predictor subspace models for multitask learning. In AISTATS, 2010.

[24] C.E. Rasmussen and Z. Ghahramani. Infinite mixtures of Gaussian process experts. In NIPS, 2002.

[25] Y.W. Teh, D. Gorur, and Z. Ghahramani. Stick-breaking construction of the Indian buffet process. In AISTATS, 2007.

[26] Y.W. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet process. JASA, 101(476):1566–1581, 2006.

[27] M. Welling, M. Rosen-Zvi, and G. Hinton. Exponential family harmoniums with an application to information retrieval. In NIPS, 2004.

[28] Y. Xue, D. Dunson, and L. Carin. The matrix stick-breaking process for flexible multi-task learning. In ICML, 2007.

[29] A. Zellner. Optimal information processing and Bayes’ theorem. American Statistician, 42:278–280, 1988.

[30] Y. Zhang and D.Y. Yeung. A convex formulation for learning task relationships in multi-task learning. In UAI, 2010.

[31] J. Zhu, A. Ahmed, and E.P. Xing. MedLDA: Maximum margin supervised topic models for regression and classification. In ICML, 2009.

[32] J. Zhu, N. Chen, and E.P. Xing. Infinite SVM: a Dirichlet process mixture of large-margin kernel machines. In ICML, 2011. 9