nips nips2011 nips2011-169 nips2011-169-reference knowledge-graph by maker-knowledge-mining

169 nips-2011-Maximum Margin Multi-Label Structured Prediction


Source: pdf

Author: Christoph H. Lampert

Abstract: We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1


reference text

[1] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.

[2] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS, 2003.

[3] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6, 2006.

[4] T. Joachims, T. Finley, and C. N. J. Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1), 2009.

[5] C. H. Teo, SVN Vishwanathan, A. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. JMLR, 11, 2010.

[6] G. Tsoumakas and I. Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 2007.

[7] K. Dembczynski, W. Cheng, and E. H¨ llermeier. Bayes optimal multilabel classification via probabilistic u classifier chains. In ICML, 2011.

[8] D. McAllester. Generalization bounds and consistency for structured labeling. In G. Bakır, T. Hofmann, B. Sch¨ lkopf, A.J. Smola, and B. Taskar, editors, Predicting Structured Data. MIT Press, 2007. o

[9] D. Nilsson. An efficient algorithm for finding the M most probable configurations in probabilistic expert systems. Statistics and Computing, 8(2), 1998.

[10] C. Yanover and Y. Weiss. Finding the M most probable configurations using loopy belief propagation. In NIPS, 2004.

[11] M. Fromer and A. Globerson. An LP View of the M-best MAP problem. In NIPS, 2009.

[12] J. Porway and S.-C. Zhu. C 4 : Exploring multiple solutions in graphical models by cluster sampling. PAMI, 33(9), 2011.

[13] M. B. Blaschko and C. H. Lampert. Learning to localize objects with structured output regression. In ECCV, 2008.

[14] A. Bordes, N. Usunier, and L. Bottou. Sequence labelling SVMs trained in one pass. ECML PKDD, 2008.

[15] M. R. Boutell, J. Luo, X. Shen, and C.M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9), 2004.

[16] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In ECML, 1998.

[17] R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2–3), 2000.

[18] Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In ACM SIGIR, 2007.

[19] T. G¨ rtner and S. Vembu. On structured output training: Hard cases and an efficient alternative. Machine a Learning, 76(2):227–242, 2009.

[20] Y. Yue and T. Joachims. Predicting diverse subsets using structural SVMs. In ICML, 2008.

[21] S. Ji, L. Tang, S. Yu, and J. Ye. Extracting shared subspaces for multi-label classification. In ACM SIGKDD, 2008.

[22] P. Rai and H. Daum´ III. Multi-label prediction via sparse infinite CCA. In NIPS, 2009. e

[23] B. Hariharan, L. Zelnik-Manor, S. V. N. Vishwanathan, and M. Varma. Large scale max-margin multilabel classification with priors. In ICML, 2010.

[24] W. Bi and J. Kwok. Multi-label classification on tree- and DAG-structured hierarchies. In ICML, 2011.

[25] D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In NIPS, 2009.

[26] G. Tsoumakas, I. Katakis, and I. Vlahavas. Effective and efficient multilabel classification in domains with large number of labels. In ECML PKDD, 2008.

[27] C. H. Lampert and M. B. Blaschko. Structured prediction by joint kernel support estimation. Machine Learning, 77(2–3), 2009.

[28] J. Petterson, T. S. Caetano, J. J. McAuley, and J. Yu. Exponential family graph matching and ranking. In NIPS, 2009.

[29] J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. JMLR, 7, 2006.

[30] A. Binder, K.-R. M¨ ller, and M. Kawanabe. On taxonomies for multi-class image categorization. IJCV, u 2011.

[31] L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In ICKM, 2004.

[32] K. Dembczynski, W. Cheng, and E. H¨ llermeier. Bayes optimal multilabel classification via probabilistic u classifier chains. In ICML, 2010.

[33] C. H. Lampert, M. B. Blaschko, and T. Hofmann. Efficient subwindow search: A branch and bound framework for object localization. PAMI, 31(12), 2009.

[34] S. Agarwal, A. Awan, and D. Roth. Learning to detect objects in images via a sparse, part-based representation. PAMI, 26(11), 2004.

[35] C. H. Lampert. An efficient divide-and-conquer cascade for nonlinear object detection. In CVPR, 2010. 9