nips nips2012 nips2012-230 nips2012-230-reference knowledge-graph by maker-knowledge-mining

230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

Source: pdf

Author: Abner Guzmán-rivera, Dhruv Batra, Pushmeet Kohli

Abstract: We address the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture. Given a set of multiple hypotheses, such components/users typically have the ability to retrieve the best (or approximately the best) solution in this set. The standard approach for handling such a scenario is to ﬁrst learn a single-output model and then produce M -Best Maximum a Posteriori (MAP) hypotheses from this model. In contrast, we learn to produce multiple outputs by formulating this task as a multiple-output structured-output prediction problem with a loss-function that effectively captures the setup of the problem. We present a max-margin formulation that minimizes an upper-bound on this lossfunction. Experimental results on image segmentation and protein side-chain prediction show that our method outperforms conventional approaches used for this type of scenario and leads to substantial improvements in prediction accuracy. 1

reference text

[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Ssstrunk. SLIC Superpixels Compared to State-of-the-art Superpixel Methods. PAMI, (To Appear) 2012. 6

[2] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen. iCoseg: Interactive Co-segmentation with Intelligent Scribble Guidance. In CVPR, 2010. 6

[3] D. Batra, P. Yadollahpour, A. Guzman-Rivera, and G. Shakhnarovich. Diverse M-Best Solutions in Markov Random Fields. In ECCV, 2012. 2, 6, 7

[4] M. B. Blaschko and C. H. Lampert. Learning to Localize Objects with Structured Output Regression. In ECCV, 2008. 5

[5] Y. Boykov and M.-P. Jolly. Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images. ICCV, 2001. 1

[6] Y. Boykov, O. Veksler, and R. Zabih. Efﬁcient Approximate Energy Minimization via Graph Cuts. PAMI, 20(12):1222–1239, 2001. 6

[7] O. S.-F. Chen Yanover and Y. Weiss. Minimizing and Learning Energy Functions for Side-Chain Prediction. Journal of Computational Biology, 15(7):899–911, 2008. 7, 8

[8] M. Collins. Discriminative Reranking for Natural Language Parsing. In ICML, pages 175–182, 2000. 2

[9] D. Dey, T. Y. Liu, M. Hebert, and J. A. Bagnell. Contextual sequence prediction with application to control library optimization. In Robotics: Science and Systems, 2012. 5

[10] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results. http://www.pascal-network.org/ challenges/VOC/voc2011/workshop/index.html. 2

[11] M. Fromer and A. Globerson. An LP View of the M-best MAP problem. In NIPS, 2009. 2

[12] L. Huang and D. Chiang. Better K-best parsing. In Proceedings of the Ninth International Workshop on Parsing Technology (IWPT), pages 53–64, 2005. 2

[13] IBM Corporation. IBM ILOG CPLEX Optimization Studio. http://www-01.ibm.com/ software/integration/optimization/cplex-optimization-studio/, 2012. 8

[14] T. Joachims, T. Finley, and C.-N. Yu. Cutting-Plane Training of Structural SVMs. Machine Learning, 77(1):27–59, 2009. 5

[15] P. Kohli and P. H. S. Torr. Measuring Uncertainty in Graph Cut Solutions. CVIU, 112(1):30–38, 2008. 6

[16] V. Kolmogorov. Convergent Tree-Reweighted Message Passing for Energy Minimization. PAMI, 28(10):1568–1583, 2006. 8

[17] V. Kolmogorov and R. Zabih. What Energy Functions can be Minimized via Graph Cuts? PAMI, 26(2):147–159, 2004. 6

[18] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In ICML, 2001. 1

[19] C. H. Lampert. Maximum Margin Multi-Label Structured Prediction. In NIPS, 2011. 5

[20] E. L. Lawler. A Procedure for Computing the K Best Solutions to Discrete Optimization Problems and Its Application to the Shortest Path Problem. Management Science, 18:401–405, 1972. 2

[21] D. W. Opitz and R. Maclin. Popular Ensemble Methods: An Empirical Study. J. Artif. Intell. Res. (JAIR), 11:169–198, 1999. 5

[22] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation. In ACL, 2002. 2

[23] D. Park and D. Ramanan. N-Best Maximal Decoders for Part Models. In ICCV, 2011. 2

[24] A. Pauls, D. Klein, and C. Quirk. Top-Down K-Best A* Parsing. In ACL, 2010. 2

[25] C. Rother, V. Kolmogorov, and A. Blake. “GrabCut” – Interactive Foreground Extraction using Iterated Graph Cuts. SIGGRAPH, 2004. 1

[26] L. Shen, A. Sarkar, and F. J. Och. Discriminative Reranking for Machine Translation. In HLT-NAACL, pages 177–184, 2004. 2

[27] B. Taskar, C. Guestrin, and D. Koller. Max-Margin Markov Networks. In NIPS, 2003. 1

[28] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large Margin Methods for Structured and Interdependent Output Variables. JMLR, 6:1453–1484, 2005. 1, 2, 3

[29] C. Yanover and Y. Weiss. Finding the M Most Probable Conﬁgurations Using Loopy Belief Propagation. In NIPS, 2003. 2, 6 9