jmlr jmlr2008 jmlr2008-30 jmlr2008-30-reference knowledge-graph by maker-knowledge-mining

30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers


Source: pdf

Author: Vojtěch Franc, Bogdan Savchynskyy

Abstract: The max-sum classifier predicts n-tuple of labels from n-tuple of observable variables by maximizing a sum of quality functions defined over neighbouring pairs of labels and observable variables. Predicting labels as MAP assignments of a Random Markov Field is a particular example of the max-sum classifier. Learning parameters of the max-sum classifier is a challenging problem because even computing the response of such classifier is NP-complete in general. Estimating parameters using the Maximum Likelihood approach is feasible only for a subclass of max-sum classifiers with an acyclic structure of neighbouring pairs. Recently, the discriminative methods represented by the perceptron and the Support Vector Machines, originally designed for binary linear classifiers, have been extended for learning some subclasses of the max-sum classifier. Besides the max-sum classifiers with the acyclic neighbouring structure, it has been shown that the discriminative learning is possible even with arbitrary neighbouring structure provided the quality functions fulfill some additional constraints. In this article, we extend the discriminative approach to other three classes of max-sum classifiers with an arbitrary neighbourhood structure. We derive learning algorithms for two subclasses of max-sum classifiers whose response can be computed in polynomial time: (i) the max-sum classifiers with supermodular quality functions and (ii) the max-sum classifiers whose response can be computed exactly by a linear programming relaxation. Moreover, we show that the learning problem can be approximately solved even for a general max-sum classifier. Keywords: max-xum classifier, hidden Markov networks, support vector machines


reference text

Y. Altun and T. Hofmann. Large margin methods for label sequence learning. In European Conference on Speech Communication and Technology (EuroSpeech), 2003. Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden markov support vector machines. In International Conference on Machine Learning. ACM Press, New York, 2003. D. Anguelov, B. Taskar, V. Chatabashev, D. Koller, D. Gupta, G. Heitz, and A. Ng. Discriminative learning of markov random fields for segmentation of 3D scan data. In International Conference on Computer Vision and Pattern Recognition, pages 167–176. IEEE Computer Society, Washington, DC, 2005. J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, 48: 259–302, 1986. Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1124–1137, Sept. 2004. Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222–1239, Nov. 2001. C. Chekuri, S. Khanna, J. Naor, and L. Zosin. Approximation algorithms for the metric labeling problem via a new linear programming formulation. In Symposium on Discrete Algorithms, pages 109–118, 2001. P.B. Chou and C.M. Brown. The theory and practice of bayesian image labeling. International Journal on Computer Vision, 4(3):185–210, June 1990. 102 D ISCRIMINATIVE L EARNING OF M AX -S UM C LASSIFIERS M. Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Conference on Empirical Methods in Natural Language Processing, pages 1–8, 2002. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, Dec. 2001. T. Finley and T. Joachims. Supervised clustering with support vector machines. In International Conference on Machine Learning, pages 217–224. ACM Press, New York, 2005. R. M. Haralick and L. G. Shapiro. The consistent labeling problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2):173–184, 1979. G. E. Hinton and T. J. Sejnowski. Learning and relearning in boltzmann machines. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, volume 1, pages 282– 317. MIT Press, Cambridge, 1986. M. Jerrum and A. Sinclair. Polynomial-time approximation algorithms for the Ising model. SIAM Journal of Computing, 22:1087–1116, 1993. V. Kolmogorov. Convergent tree-reweighted message passing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1568–1583, 2006. V. Kolmogorov and R. Zabih. What energy functions can be minimized via graph cuts? In European Conference on Computer Vision, pages 65–81. Springer-Verlag, 2002. A. Koster, C. P. M. Hoesel, and A. W. J. Kolen. The partial constraint satisfaction problem: Facets and lifting theorems. Operations Research Letters, 23(3–5):89–97, 1998. V.A. Koval and M.I. Schlesinger. Dvumernoe programmirovanie v zadachakh analiza izobrazheniy (two-dimensional programming in image analysis problems). USSR Academy of Science, Automatics and Telemechanics, 8:149–168, 1976. In Russian. I. Kovtun. Segmentaciya Zobrazhen na Usnovi Dostatnikh Umov Optimalnosti v NP-povnikh Klasakh Zadach Strukturnoi Rozmitki (Image Segmentation Based on Sufficient Conditions Of Optimality in NP-complete Classes of Structural Labeling Problems). PhD thesis, IRTC ITS Nat. Academy of Science Ukraine, Kiev, 2004. In Ukrainian. A. Novikoff. On convergence proofs of perceptrons. In Symposium on Mathematical Theory of Automata, volume 12, pages 615–622. Polytechnic Institute of Brooklyn, 1962. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA, USA, 1988. N.D. Ratliff and J.A. Bagnell. Subgradient methods for maximum margin structured learning. In ICML Workshop on Learning in Structured Output Spaces, 2006. A. Rosenfeld, R. A. Hummel, and S. W. Zucker. Scene labeling by relaxation operations. IEEE Transactions on Systems, Man, and Cybernetics, 6(6):420–433, June 1976. 103 F RANC AND S AVCHYNSKYY D. Schlesinger. Structurelle Ans¨ tze f¨ r die Stereoreconstruction. PhD thesis, Technische Univera u sit¨ t Dresden, Fakult¨ t Informatik, Institut f¨ r K¨ nstliche Intelligenz, July 2005. In German. a a u u M.I. Schlesinger. Sintaksicheskiy analiz dvumernykh zritelnikh signalov v usloviyakh pomekh (Syntactic analysis of two-dimensional visual signals in noisy conditions). Kibernetika, 4:113– 130, 1976. In Russian. M.I. Schlesinger and B. Flach. Some solvable sublcasses of structural recognition problems. In Czech Pattern Recognition Workshop. Czech Pattern Recognition Society, 2000. M.I. Schlesinger and V. Hlav´ c. Ten Lectures on Statistical and Structural Pattern Recognition. aˇ Kluwer Academic Publishers, 2002. B. Sch¨ lkopf, C. Burges, and V. Vapnik. Extracting support data for a given task. In U.M. Fayyad o and R. Uthurusamy, editors, International Conference on Knowledge Discovery & Data Mining. AAAI Press, Menlo Park, 1995. B. Taskar, V. Chatalbashev, and D. Koller. Learning associative markov networks. In International Conference on Machine Learning. ACM Press, New York, 2004a. B. Taskar, C. Guestrin, and D. Koller. Maximum-margin markov networks. In Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, 2004b. B. Taskar, S. Lacoste-Jullien, and M.I. Jordan. Structured prediction, dual extragradient and bregman projections. Journal of Machine Learning Research, 7:1627–1653, Jul. 2006. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453–1484, Sep. 2005. V. Vapnik. Statistical Learning Theory. John Wiley & Sons, Inc., 1998. M. Wainwright, T. Jaakkola, and A. Willsky. MAP estimation via agreement on hypertrees: message passing and linear programming approaches. In Conference on Communication, Control and Computing, 2002. T. Werner. A linear programming approach to max-sum problem: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(7):1165–1179, July 2007. T. Werner. A linear programming approach to max-sum problem: A review. Research Report CTU–CMP–2005–25, Center for Machine Perception, K13133 FEE Czech Technical University, Prague, Czech Republic, December 2005. J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7):2282–2312, 2005. 104