nips nips2010 nips2010-61 nips2010-61-reference knowledge-graph by maker-knowledge-mining

61 nips-2010-Direct Loss Minimization for Structured Prediction

Source: pdf

Author: Tamir Hazan, Joseph Keshet, David A. McAllester

Abstract: In discriminative machine learning one is interested in training a system to optimize a certain desired measure of performance, or loss. In binary classiﬁcation one typically tries to minimizes the error rate. But in structured prediction each task often has its own measure of performance such as the BLEU score in machine translation or the intersection-over-union score in PASCAL segmentation. The most common approaches to structured prediction, structural SVMs and CRFs, do not minimize the task loss: the former minimizes a surrogate loss with no guarantees for task loss and the latter minimizes log loss independent of task loss. The main contribution of this paper is a theorem stating that a certain perceptronlike learning rule, involving features vectors derived from loss-adjusted inference, directly corresponds to the gradient of task loss. We give empirical results on phonetic alignment of a standard test set from the TIMIT corpus, which surpasses all previously reported results on this problem. 1

reference text

[1] F. Brugnara, D. Falavigna, and M. Omologo. Automatic segmentation and labeling of speech based on hidden markov models. Speech Communication, 12:357–370, 1993.

[2] D. Chiang, K. Knight, and W. Wang. 11,001 new features for statistical machine translation. In Proc. NAACL, 2009, 2009. 8

[3] M. Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Conference on Empirical Methods in Natural Language Processing, 2002.

[4] J.-P. Hosom. Speaker-independent phoneme alignment using transition-dependent states. Speech Communication, 51:352–368, 2009.

[5] J. Keshet, S. Shalev-Shwartz, Y. Singer, and D. Chazan. A large margin algorithm for speech and audio segmentation. IEEE Trans. on Audio, Speech and Language Processing, Nov. 2007.

[6] K.-F. Lee and H.-W. Hon. Speaker independent phone recognition using hidden markov models. IEEE Trans. Acoustic, Speech and Signal Proc., 37(2):1641–1648, 1989.

[7] P. Liang, A. Bouchard-Ct, D. Klein, and B. Taskar. An end-to-end discriminative approach to machine translation. In International Conference on Computational Linguistics and Association for Computational Linguistics (COLING/ACL), 2006.

[8] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In Advances in Neural Information Processing Systems 17, 2003.

[9] D.T. Toledano, L.A.H. Gomez, and L.V. Grande. Automatic phoneme segmentation. IEEE Trans. Speech and Audio Proc., 11(6):617–625, 2003.

[10] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453–1484, 2005.

[11] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1–305, December 2008. 9