nips nips2007 nips2007-63 nips2007-63-reference knowledge-graph by maker-knowledge-mining

63 nips-2007-Convex Relaxations of Latent Variable Training

Source: pdf

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semideﬁnite relaxation that yields global training by eliminating local minima. 1

reference text

[1] J. Borwein and A. Lewis. Convex Analysis and Nonlinear Optimization. Springer, 2000.

[2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge U. Press, 2004.

[3] S. Chen. Models for grapheme-to-phoneme conversion. In Eurospeech, 2003.

[4] T. De Bie and N. Cristianini. Fast SDP relaxations of graph cut clustering, transduction, and other combinatorial problems. Journal of Machine Learning Research, 7, 2006.

[5] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39(1):1–38, 1977.

[6] M. Goemans and D. Williamson. Improved approximation algorithms for maximum cut and satisﬁability problems using semideﬁnite programming. JACM, 42(6):1115–1145, 1995.

[7] S. Goldwater and M. Johnson. Bias in learning syllable structure. In Proc. CONLL, 2005.

[8] D. Klein and C. Manning. Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings ACL, 2004.

[9] B. Merialdo. Tagging text with a probabilistic model. Comput. Ling., 20(2):155–171, 1994.

[10] R. Neal and G. Hinton. A view of the em algorithm that justiﬁes incremental, sparse, and other variants. In M. Jordan, editor, Learning in Graphical Models. Kluwer, 1998.

[11] J. Nocedal and S. Wright. Numerical Optimization. Springer, 1999.

[12] R. Salakhutdinov, S. Roweis, and Z. Ghahramani. Optimization with EM and expectationconjugate-gradient. In Proceedings ICML, 2003.

[13] N. Srebro, G. Shakhnarovich, and S. Roweis. An investigation of computational and informational limits in gaussian mixture clustering. In Proceedings ICML, 2006.

[14] M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Technical Report TR-649, UC Berkeley, Dept. Statistics, 2003.

[15] L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Max margin clustering. In NIPS 17, 2004.

[16] L. Xu, D. Wilkinson, F. Southey, and D. Schuurmans. Discriminative unsupervised learning of structured predictors. In Proceedings ICML, 2006.