nips nips2008 nips2008-176 nips2008-176-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jun Zhu, Eric P. Xing, Bo Zhang
Abstract: Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3 N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. 1
[1] Y. Altun, D. McAllester, and M. Belkin. Maximum margin semi-supervised learning for structured variables. In NIPS, 2006.
[2] Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden markov support vector machines. In ICML, 2003.
[3] P. Bartlett, M. Collins, B. Taskar, and D. McAllester. Exponentiated gradient algorithms for larg-margin structured classification. In NIPS, 2004.
[4] U. Brefeld and T. Scheffer. Semi-supervised learning for structured output variables. In ICML, 2006.
[5] M. Dud´k, S.J. Phillips, and R.E. Schapire. Maximum entropy density estimation with generalized ı regularization and an application to species distribution modeling. JMLR, (8):1217–1260, 2007.
[6] R. Jin and Z. Ghahramani. Learning with multiple labels. In NIPS, 2002.
[7] J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.
[8] G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In NIPS, 2001.
[9] A. Quattoni, M. Collins, and T. Darrell. Conditional random fields for object recognition. In NIPS, 2004.
[10] N.D. Ratliff, J.A. Bagnell, and M.A. Zinkevich. (online) subgradient methods for structured prediction. In AISTATS, 2007.
[11] F. Sha and L. Saul. Large margin hidden markov models for automatic speech recognition. In NIPS, 2006.
[12] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.
[13] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In ICML, 2004.
[14] J. Verbeek and B. Triggs. Scene segmentation with conditional random fields learned from partially labeled images. In NIPS, 2007.
[15] A. W¨chter and L.T. Biegler. On the implementation of a primal-dual interior point filter line search a algorithm for large-scale nonlinear programming. Mathematical Programming, (106(1)):25–57, 2006.
[16] L. Xu, D. Wilkinson, F. Southey, and D. Schuurmans. Discriminative unsupervised learning of structured predictors. In ICML, 2006.
[17] J. Zhu, Z. Nie, J.-R. Wen, B. Zhang, and W.-Y. Ma. Simultaneous record detection and attribute labeling in web data extraction. In SIGKDD, 2006.
[18] J. Zhu, Z. Nie, B. Zhang, and J.-R. Wen. Dynamic hierarchical markov random fields and their application to web data extraction. In ICML, 2007.
[19] J. Zhu, E.P. Xing, and B. Zhang. Laplace maximum margin markov networks. In ICML, 2008.
[20] J. Zhu, E.P. Xing, and B. Zhang. Maximum entropy discrimination markov networks. Technical Report CMU-ML-08-104, Machine Learning Department, Carnegie Mellon University, 2008. 8