nips nips2007 nips2007-187 nips2007-187-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alex Kulesza, Fernando Pereira
Abstract: In many structured prediction problems, the highest-scoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufficient. We show in particular that learning can fail even with an approximate inference method with rigorous approximation guarantees. There are two reasons for this. First, approximate methods can effectively reduce the expressivity of an underlying model by making it impossible to choose parameters that reliably give good predictions. Second, approximations can respond to parameter changes in such a way that standard learning algorithms are misled. In contrast, we give two positive results in the form of learning bounds for the use of LP-relaxed inference in structured perceptron and empirical risk minimization settings. We argue that without understanding combinations of inference and learning, such as these, that are appropriately compatible, learning performance under approximate inference cannot be guaranteed. 1
[1] Gregory F. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks (research note). Artif. Intell., 42(2-3):393–405, 1990.
[2] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning, pages 282–289, 2001.
[3] Ben Taskar, Vassil Chatalbashev, and Daphne Koller. Learning associative Markov networks. In ICML ’04: Proceedings of the twenty-first international conference on Machine learning, page 102, 2004.
[4] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.
[5] Kevin Murphy, Yair Weiss, and Michael Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 467–47, 1999.
[6] M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky. MAP estimation via agreement on trees: messagepassing and linear programming. IEEE Transactions on Information Theory, 51(11):3697–3717, 2005.
[7] D. Roth and W. Yih. A linear programming formulation for global inference in natural language tasks. In Proc. of the Conference on Computational Natural Language Learning (CoNLL), pages 1–8, 2004.
[8] Charles Sutton and Andrew McCallum. Collective segmentation and labeling of distant entities in information extraction. Technical Report TR # 04-49, University of Massachusetts, 2004.
[9] Michael Collins. Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In EMNLP ’02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 1–8, 2002.
[10] David A. McAllester. PAC-bayesian stochastic model selection. Machine Learning, 51(1):5–21, 2003.
[11] David McAllester. Generalization bounds and consistency for structured labeling. In Predicting Structured Data. MIT Press, To Appear.
[12] Charles Sutton and Andrew McCallum. Piecewise training of undirected models. In 21st Conference on Uncertainty in Artificial Intelligence, 2005.
[13] Hal Daum´ III and Daniel Marcu. Learning as search optimization: Approximate large margin methods e for structured prediction. In International Conference on Machine Learning (ICML), 2005.
[14] Martin J. Wainwright. Estimating the ”wrong” graphical model: Benefits in the computation-limited setting. Journal of Machine Learning Research, 7:1829–1859, 2006.
[15] V. Punyakanok, D. Roth, W. Yih, and D. Zimak. Learning and inference over constrained output. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1124–1129, 2005. 8