nips nips2011 nips2011-169 nips2011-169-reference knowledge-graph by maker-knowledge-mining

169 nips-2011-Maximum Margin Multi-Label Structured Prediction

Source: pdf

Author: Christoph H. Lampert

Abstract: We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classiﬁcation techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneﬁcial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efﬁcient working set training, and PAC-Bayesian generalization bounds. 1

reference text

[1] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.

[2] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS, 2003.

[3] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6, 2006.

[4] T. Joachims, T. Finley, and C. N. J. Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1), 2009.

[5] C. H. Teo, SVN Vishwanathan, A. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. JMLR, 11, 2010.

[6] G. Tsoumakas and I. Katakis. Multi-label classiﬁcation: An overview. International Journal of Data Warehousing and Mining, 3(3), 2007.

[7] K. Dembczynski, W. Cheng, and E. H¨ llermeier. Bayes optimal multilabel classiﬁcation via probabilistic u classiﬁer chains. In ICML, 2011.

[8] D. McAllester. Generalization bounds and consistency for structured labeling. In G. Bakır, T. Hofmann, B. Sch¨ lkopf, A.J. Smola, and B. Taskar, editors, Predicting Structured Data. MIT Press, 2007. o

[9] D. Nilsson. An efﬁcient algorithm for ﬁnding the M most probable conﬁgurations in probabilistic expert systems. Statistics and Computing, 8(2), 1998.

[10] C. Yanover and Y. Weiss. Finding the M most probable conﬁgurations using loopy belief propagation. In NIPS, 2004.

[11] M. Fromer and A. Globerson. An LP View of the M-best MAP problem. In NIPS, 2009.

[12] J. Porway and S.-C. Zhu. C 4 : Exploring multiple solutions in graphical models by cluster sampling. PAMI, 33(9), 2011.

[13] M. B. Blaschko and C. H. Lampert. Learning to localize objects with structured output regression. In ECCV, 2008.

[14] A. Bordes, N. Usunier, and L. Bottou. Sequence labelling SVMs trained in one pass. ECML PKDD, 2008.

[15] M. R. Boutell, J. Luo, X. Shen, and C.M. Brown. Learning multi-label scene classiﬁcation. Pattern Recognition, 37(9), 2004.

[16] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In ECML, 1998.

[17] R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2–3), 2000.

[18] Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In ACM SIGIR, 2007.

[19] T. G¨ rtner and S. Vembu. On structured output training: Hard cases and an efﬁcient alternative. Machine a Learning, 76(2):227–242, 2009.

[20] Y. Yue and T. Joachims. Predicting diverse subsets using structural SVMs. In ICML, 2008.

[21] S. Ji, L. Tang, S. Yu, and J. Ye. Extracting shared subspaces for multi-label classiﬁcation. In ACM SIGKDD, 2008.

[22] P. Rai and H. Daum´ III. Multi-label prediction via sparse inﬁnite CCA. In NIPS, 2009. e

[23] B. Hariharan, L. Zelnik-Manor, S. V. N. Vishwanathan, and M. Varma. Large scale max-margin multilabel classiﬁcation with priors. In ICML, 2010.

[24] W. Bi and J. Kwok. Multi-label classiﬁcation on tree- and DAG-structured hierarchies. In ICML, 2011.

[25] D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In NIPS, 2009.

[26] G. Tsoumakas, I. Katakis, and I. Vlahavas. Effective and efﬁcient multilabel classiﬁcation in domains with large number of labels. In ECML PKDD, 2008.

[27] C. H. Lampert and M. B. Blaschko. Structured prediction by joint kernel support estimation. Machine Learning, 77(2–3), 2009.

[28] J. Petterson, T. S. Caetano, J. J. McAuley, and J. Yu. Exponential family graph matching and ranking. In NIPS, 2009.

[29] J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classiﬁcation models. JMLR, 7, 2006.

[30] A. Binder, K.-R. M¨ ller, and M. Kawanabe. On taxonomies for multi-class image categorization. IJCV, u 2011.

[31] L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In ICKM, 2004.

[32] K. Dembczynski, W. Cheng, and E. H¨ llermeier. Bayes optimal multilabel classiﬁcation via probabilistic u classiﬁer chains. In ICML, 2010.

[33] C. H. Lampert, M. B. Blaschko, and T. Hofmann. Efﬁcient subwindow search: A branch and bound framework for object localization. PAMI, 31(12), 2009.

[34] S. Agarwal, A. Awan, and D. Roth. Learning to detect objects in images via a sparse, part-based representation. PAMI, 26(11), 2004.

[35] C. H. Lampert. An efﬁcient divide-and-conquer cascade for nonlinear object detection. In CVPR, 2010. 9